Systems and methods for identifying a person's native language and/or non-native language based on code-switched text and/or speech, are presented. The systems may be trained using various methods. For example, a language identification system may be trained using one or more code-switched corpora. Text and/or speech features may be extracted from the corpora and used, in combination with a per-word language identify of the text and/or speech, to train at least one machine learner. Code-switched text and/or speech may be received and processed by extracting text and/or speech features. These features may be fed into the at least one machine learner to identify the person's native language.