International Workshop on Spoken Language Translation (IWSLT) 2010
For spoken language systems to effectively operate across multiple languages it is critical to rapidly apply the correct language-specific speech recognition models. Prior approaches consist of either, first identifying the language being spoken and selecting the appropriate languagespecific speech recognition engine; or alternatively, performing speech recognition in parallel and selecting the language and recognition hypothesis with maximum likelihood. Both these approaches, however, introduce a significant delay before back-end natural language processing can proceed. In this work, we propose a novel method for joint language identification and speech recognition that can operate in near real-time. The proposed approach compares partial hypotheses generated on-the-fly during decoding and generates a classification decision soon after the first full hypothesis has been generated. When applied within our English-Iraqi speech-to-speech translation system the proposed approach correctly identified the input language with 99.6% accuracy while introducing minimal delay to the end-to-end system.
Index Terms. Language Identification, Speech Recognition, Multilingual Spoken Language Understanding
Bibliographic reference. Lim, Daniel Chung Yong / Lane, Ian / Waibel, Alex (2010): "Real-time spoken language identification and recognition for speech-to-speech translation", In IWSLT-2010, 307-312.