ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Language identification from short segments of speech

Jyotsana Balleda, Hema A Murthy, T. Nagarajan

Automatic language identification (LID) from the spo- ken speech utterance is a challenging problem. In this paper, we present an LID system that works for South Indian languages and Hindi. Each language is modeled using an approach based on Vector Quantisation [1]. The speech is segmented into di erent sounds (CVs) and the performance of the system on each of the seg- ments is studied. Our studies indicate that the pres- ence of some CVs is crucial for each language. We al- so find that for the same Consonant and Vowel (CV) combination, the quality of the sound is di erent in di erent languages. We show that once the speech signal is segmented into CVs, it is possible to perfor- m LID on very short segments (100-150ms) of speech itself.


Cite as: Balleda, J., Murthy, H.A., Nagarajan, T. (2000) Language identification from short segments of speech. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 1033-1036

@inproceedings{balleda00_icslp,
  author={Jyotsana Balleda and Hema A Murthy and T. Nagarajan},
  title={{Language identification from short segments of speech}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 3, 1033-1036}
}