The bilingual nature of the Maltese Islands gives rise to frequent occurrences of code switching, both verbally and in writing. In designing a polyglot TTS system capable of handling SMS messages within the local context, it was necessary to come up with a pre-processing mechanism for identifying the language of origin of individual word tokens. Given that certain common words can be interlingually ambiguous and that the domain under consideration is both open and subject to containing various word contractions and spelling mistakes, the task is not as straightforward as it may seem at first. In this paper we discuss a language neutral language identification approach capable of handling the characteristics of the domain in a robust fashion.
Bibliographic reference. Rosner, Mike / Farrugia, Paulseph-John (2007): "A tagging algorithm for mixed language identification in a noisy domain", In INTERSPEECH-2007, 190-193.