Sixth International Conference on Spoken Language Processing
Automatic language identification (LID) from the spo- ken speech utterance is a challenging problem. In this paper, we present an LID system that works for South Indian languages and Hindi. Each language is modeled using an approach based on Vector Quantisation . The speech is segmented into dierent sounds (CVs) and the performance of the system on each of the seg- ments is studied. Our studies indicate that the pres- ence of some CVs is crucial for each language. We al- so find that for the same Consonant and Vowel (CV) combination, the quality of the sound is dierent in dierent languages. We show that once the speech signal is segmented into CVs, it is possible to perfor- m LID on very short segments (100-150ms) of speech itself.
Bibliographic reference. Balleda, Jyotsana / Murthy, Hema A / Nagarajan, T. (2000): "Language identification from short segments of speech", In ICSLP-2000, vol.3, 1033-1036.