The provision of automatic systems that can provide conversational practice for beginners would make a valuable addition to existing aids for foreign language teaching. To achieve this goal, the SCILL (Spoken Conversational Interaction for Language Learning) project is developing a spoken dialogue system that is capable of maintaining interactive dialogues with non-native students in the target language. However, the effective realisation of the intelligent language understanding and dialogue management needed for such a system, requires robust recognition of poorly articulated non-native speech. This paper studies several popular techniques for robust acoustic modelling including HLDA,MAP and CMLLR on non-native speech data within a specific dialogue domain. In addition, a novel approach for using cross language speech data to adapt the acoustic models is described and shown to be useful when very limited non-native adaptation data is available. The experimental results provide a clear story of how to improve recognition performance on non-native speech for a specific task, and this will be of interest more generally for those developing multi-lingual spoken dialogue systems.
Cite as: Ye, H., Young, S. (2005) Improving the speech recognition performance of beginners in spoken conversational interaction for language learning. Proc. Interspeech 2005, 289-292, doi: 10.21437/Interspeech.2005-160
@inproceedings{ye05_interspeech, author={Hui Ye and Steve Young}, title={{Improving the speech recognition performance of beginners in spoken conversational interaction for language learning}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={289--292}, doi={10.21437/Interspeech.2005-160} }