Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Spoken Language Identification Utilizing Fundamental Frequency and Cepstra

Shuichi Itahashi, Toshikazu Kiuchi, Mikio Yamamoto

Institute of Information Sciences and Electronics, University of Tsukuba, Ibaraki, Japan

This paper describes a combined method of spoken language identification, which utilizes speech fundamental frequency (Fo) and mel cepstral coefficients. In the first method, the Fo contour was used as prosodic information; its trajectory was approximated by polygonal lines or exponential functions, their parameters were used for discrimination. The second method is based on an ergodic HMM using cepstra as segmental information. The number of states of the HMM was varied from 4 to 64. Speech data of 40-seconds spontaneous uttereances were used, spoken by 50 male speakers for each of the 10 languages considered in this study. The results show the effectiveness of the two proposed methods, and that better indentification rate is obtained by combining the two methods.

