Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Language Identification from Short Segments of Speech

Jyotsana Balleda, Hema A Murthy, T. Nagarajan

Department of Computer Science and Engineering, Indian Institute of Technology, Madras, Chennai, India

Automatic language identification (LID) from the spo- ken speech utterance is a challenging problem. In this paper, we present an LID system that works for South Indian languages and Hindi. Each language is modeled using an approach based on Vector Quantisation [1]. The speech is segmented into di erent sounds (CVs) and the performance of the system on each of the seg- ments is studied. Our studies indicate that the pres- ence of some CVs is crucial for each language. We al- so find that for the same Consonant and Vowel (CV) combination, the quality of the sound is di erent in di erent languages. We show that once the speech signal is segmented into CVs, it is possible to perfor- m LID on very short segments (100-150ms) of speech itself.

