Sixth European Conference on Speech Communication and Technology
This paper describes a combined method of spoken language identification, which utilizes speech fundamental frequency (Fo) and mel cepstral coefficients. In the first method, the Fo contour was used as prosodic information; its trajectory was approximated by polygonal lines or exponential functions, their parameters were used for discrimination. The second method is based on an ergodic HMM using cepstra as segmental information. The number of states of the HMM was varied from 4 to 64. Speech data of 40-seconds spontaneous uttereances were used, spoken by 50 male speakers for each of the 10 languages considered in this study. The results show the effectiveness of the two proposed methods, and that better indentification rate is obtained by combining the two methods.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Itahashi, Shuichi / Kiuchi, Toshikazu / Yamamoto, Mikio (1999): "Spoken language identification utilizing fundamental frequency and cepstra", In EUROSPEECH'99, 383-386.