Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Automatic Language Identification Using Mixed-Order HMMs and Untranscribed Corpora

Ludwig Schwardt, Johan du Preez

Department of Electrical and Electronic Engineering, University of Stellenbosch, Matieland, South Africa

The state-of-the-art language identification (LID) systems are based on phone recognisers and n-gram language models, which require the use of transcribed speech databases for training. An alternate solution to the LID problem directly applies mixedorder hidden Markov models (HMMs) to untranscribed speech. The competitive performance of these mixed-order HMMs on the NIST 1996 evaluation set is very promising, considering the ease of implementation and many possible improvements. This validates a novel mixed-order HMM training procedure and extends previous results obtained with high-order HMMs to take advantage of larger datasets.

Full Paper

Bibliographic reference.  Schwardt, Ludwig / Preez, Johan du (2000): "Automatic language identification using mixed-order HMMs and untranscribed corpora", In ICSLP-2000, vol.2, 254-257.