Sixth International Conference on Spoken Language Processing
October 16-20, 2000
Automatic Language Identification Using Mixed-Order HMMs and Untranscribed Corpora
Ludwig Schwardt, Johan du Preez
Department of Electrical and Electronic Engineering,
University of Stellenbosch, Matieland, South Africa
The state-of-the-art language identification (LID) systems are
based on phone recognisers and n-gram language models, which
require the use of transcribed speech databases for training. An
alternate solution to the LID problem directly applies mixedorder
hidden Markov models (HMMs) to untranscribed speech.
The competitive performance of these mixed-order HMMs on the
NIST 1996 evaluation set is very promising, considering the ease
of implementation and many possible improvements. This validates
a novel mixed-order HMM training procedure and extends
previous results obtained with high-order HMMs to take advantage
of larger datasets.
Schwardt, Ludwig / Preez, Johan du (2000):
"Automatic language identification using mixed-order HMMs and untranscribed corpora",
In ICSLP-2000, vol.2, 254-257.