ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Automatic language identification using mixed-order HMMs and untranscribed corpora

Ludwig Schwardt, Johan du Preez

The state-of-the-art language identification (LID) systems are based on phone recognisers and n-gram language models, which require the use of transcribed speech databases for training. An alternate solution to the LID problem directly applies mixedorder hidden Markov models (HMMs) to untranscribed speech. The competitive performance of these mixed-order HMMs on the NIST 1996 evaluation set is very promising, considering the ease of implementation and many possible improvements. This validates a novel mixed-order HMM training procedure and extends previous results obtained with high-order HMMs to take advantage of larger datasets.


Cite as: Schwardt, L., Preez, J.d. (2000) Automatic language identification using mixed-order HMMs and untranscribed corpora. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 2, 254-257

@inproceedings{schwardt00b_icslp,
  author={Ludwig Schwardt and Johan du Preez},
  title={{Automatic language identification using mixed-order HMMs and untranscribed corpora}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 2, 254-257}
}