ISCA Archive IWSLT 2010
ISCA Archive IWSLT 2010

Real-time spoken language identification and recognition for speech-to-speech translation

Daniel Chung Yong Lim, Ian Lane, Alex Waibel

For spoken language systems to effectively operate across multiple languages it is critical to rapidly apply the correct language-specific speech recognition models. Prior approaches consist of either, first identifying the language being spoken and selecting the appropriate languagespecific speech recognition engine; or alternatively, performing speech recognition in parallel and selecting the language and recognition hypothesis with maximum likelihood. Both these approaches, however, introduce a significant delay before back-end natural language processing can proceed. In this work, we propose a novel method for joint language identification and speech recognition that can operate in near real-time. The proposed approach compares partial hypotheses generated on-the-fly during decoding and generates a classification decision soon after the first full hypothesis has been generated. When applied within our English-Iraqi speech-to-speech translation system the proposed approach correctly identified the input language with 99.6% accuracy while introducing minimal delay to the end-to-end system.

Index Terms. Language Identification, Speech Recognition, Multilingual Spoken Language Understanding


Cite as: Lim, D.C.Y., Lane, I., Waibel, A. (2010) Real-time spoken language identification and recognition for speech-to-speech translation. Proc. International Workshop on Spoken Language Translation (IWSLT 2010), 307-312

@inproceedings{lim10_iwslt,
  author={Daniel Chung Yong Lim and Ian Lane and Alex Waibel},
  title={{Real-time spoken language identification and recognition for speech-to-speech translation}},
  year=2010,
  booktitle={Proc. International Workshop on Spoken Language Translation (IWSLT 2010)},
  pages={307--312}
}