2nd Workshop on Spoken Language Technologies for Under-Resourced Languages

Universiti Sains, Penang, Malaysia
May 3-5, 2010

Language Identification of Code Switching Malay-English Words Using Syllable Structure Information

Yin-Lai Yeong, Tien-Ping Tan

School of Computer Science, Universiti Sains Malaysia, Penang, Malaysia

This paper introduces a language identification approach using syllable structure information. We also review and compare other approaches. Most of these approaches use linguistic information for language identification. The information used for language identification is Malay affixation information, English vocabulary list, alphabet ngram, grapheme n-gram. The approach using syllable structure information has the highest accuracy at 93.73% compared to other approaches. Based on the accuracy result of comparison, by using syllable structure 1.91% accuracy had increased for language identification compare with the second higher result in this paper. Syllable structure information is able to gain a better result for language identification.

Index Terms: Language identification, code switching, syllable structure information, Malay, English

Full Paper

Bibliographic reference.  Yeong, Yin-Lai / Tan, Tien-Ping (2010): "Language identification of code switching Malay-English words using syllable structure information", In SLTU-2010, 142-145.