This paper describes an automatic language identification method based on HMMs (Hidden Markov Models) for acoustic features. The hidden Markov modeling is used to represent the dynamics of the states of the vocal tract. We note here that each language has its proper phonotactics. For the experiment of identification, utterances in 4 languages (English, Japanese, Mandarin Chinese and Indonesian) were modeled by several HMMs. They were uttered by 15 male speakers (10 for training the HMM and 5 for testing) for each language. These trained HMMs showed considerable inter-language variations. The HMM topology used here is a fully structured (ergodic) model in which any state could transit to all other states includeing itself. Here, we used 2 kinds of HMMs: the discrete HMM with the codebook and the continuous density HMM. The HMM was trained using both the Baum-Welch (Forward-Backward) algorithm and the Viterbi algorithm. The latter was used for emphasizing and extracting the state transition. The extracted state sequence was modeled by the 2-nd order Markov model (tri-gram). For comparison, we also experimented on the identification using the VQ (Vector Quantization) distortion and the CMDF (Continuous Mixture Density output probability Functions). The results showed that the combined method of CHMM and tri-gram identified 4 languages very well (the best correct identification rate was 90.3%). Keywords; language identification, ergodic HMM, phonotactics, optimal state sequence, trigram.
Bibliographic reference. Seino, Takashi / Nakagawa, Seiichi (1993): "Spoken language identification using ergodic HMM with emphasized state transition", In EUROSPEECH'93, 133-136.