For spoken language systems to effectively operate across multiple languages it is critical to rapidly apply the correct language-specific speech recognition models. Prior approaches consist of either, first identifying the language being spoken and selecting the appropriate languagespecific speech recognition engine; or alternatively, performing speech recognition in parallel and selecting the language and recognition hypothesis with maximum likelihood. Both these approaches, however, introduce a significant delay before back-end natural language processing can proceed. In this work, we propose a novel method for joint language identification and speech recognition that can operate in near real-time. The proposed approach compares partial hypotheses generated on-the-fly during decoding and generates a classification decision soon after the first full hypothesis has been generated. When applied within our English-Iraqi speech-to-speech translation system the proposed approach correctly identified the input language with 99.6% accuracy while introducing minimal delay to the end-to-end system.
Index Terms. Language Identification, Speech Recognition, Multilingual Spoken Language Understanding
Cite as: Lim, D.C.Y., Lane, I., Waibel, A. (2010) Real-time spoken language identification and recognition for speech-to-speech translation. Proc. International Workshop on Spoken Language Translation (IWSLT 2010), 307-312
@inproceedings{lim10_iwslt, author={Daniel Chung Yong Lim and Ian Lane and Alex Waibel}, title={{Real-time spoken language identification and recognition for speech-to-speech translation}}, year=2010, booktitle={Proc. International Workshop on Spoken Language Translation (IWSLT 2010)}, pages={307--312} }