7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

On Text-Based Language Identification for Multilingual Speech Recognition Systems

Jilei Tian (1), Juha Häkkinen (2), Søren Riis (3), Kåre Jean Jensen (4)

(1) Nokia Research Center, Finland; (2) Nokia Mobile Phones, Finland; (3) Oticon A/S, Denmark; (4) Nokia Mobile Phones, Denmark

The demand for multilingual speech recognition systems is growing rapidly. Automatic language identification is an integral part of multilingual systems that use dynamic vocabularies. Most state-of-theart automatic language identification approaches identify the language based on probabilities of phoneme sequences extracted from the acoustic signal. Such methods can, however, not be applied to language identification based on text alone. This paper compares three text-based language identification methods aimed particularly at very short segments of text as encountered in, e.g., name dialling or command word control applications. The first method is based on artificial neural networks, the second on decision trees and the third on n-gram letter statistics. We conducted a series of experiments and the neural network approach is clearly better in terms of generalization performance and complexity.

Full Paper

Bibliographic reference.  Tian, Jilei / Häkkinen, Juha / Riis, Søren / Jensen, Kåre Jean (2002): "On text-based language identification for multilingual speech recognition systems", In ICSLP-2002, 501-504.