5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Language Identification Incorporating Lexical Information

Driss Matrouf, Martine Adda-Decker, Lori F. Lamel, Jean-Luc Gauvain

LIMSI/CNRS, France

In this paper we explore the use of lexical information for language identification (LID). Our reference LID system uses language-dependent acoustic phone models and phone-based bigram language models. For each language, lexical information is introduced by augmenting the phone vocabulary with the N most frequent words in the training data. Combined phone and word bigram models are used to provide linguistic constraints during acoustic decoding. Experiments were carried out on a 4-language telephone speech corpus. Using lexical information achieves a relative error reduction of about 20% on spontaneous and read speech compared to the reference phone-based system. Identification rates of 92%, 96% and 99% are achieved for spontaneous, read and task-specific speech segments respectively, with prior speech detection.

Full Paper

Bibliographic reference.  Matrouf, Driss / Adda-Decker, Martine / Lamel, Lori F. / Gauvain, Jean-Luc (1998): "Language identification incorporating lexical information", In ICSLP-1998, paper 0990.