7th International Conference on Spoken Language Processing
September 16-20, 2002
The demand for multilingual speech recognition systems is growing rapidly. Automatic language identification is an integral part of multilingual systems that use dynamic vocabularies. Most state-of-theart automatic language identification approaches identify the language based on probabilities of phoneme sequences extracted from the acoustic signal. Such methods can, however, not be applied to language identification based on text alone. This paper compares three text-based language identification methods aimed particularly at very short segments of text as encountered in, e.g., name dialling or command word control applications. The first method is based on artificial neural networks, the second on decision trees and the third on n-gram letter statistics. We conducted a series of experiments and the neural network approach is clearly better in terms of generalization performance and complexity.
Bibliographic reference. Tian, Jilei / Häkkinen, Juha / Riis, Søren / Jensen, Kåre Jean (2002): "On text-based language identification for multilingual speech recognition systems", In ICSLP-2002, 501-504.