8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Speaker Adaptation for Non-Native Speakers Using Bilingual English Lexicon and Acoustic Models

S. Matsunaga, A. Ogawa, Yoshikazu Yamaguchi, A. Imamura

NTT Corporation, Japan

This paper proposes a supervised speaker adaptation method that is effective for both non-native (i.e. Japanese) and native English speakers' pronunciation of English speech. This method uses English and Japanese phoneme acoustic models and a pronunciation lexicon in which each word has both English and Japanese phoneme transcriptions. The same utterances are used for adaptation of both acoustic models. A recognition system uses these two adapted acoustic models and the lexicon, and the highest-likelihood word sequence obtained in combining with English- and Japanese-pronounced words is the recognition result. Continuous speech recognition experiments show that the proposed adaptation method greatly improves both Japanese-English and native- English recognition performance, and the system using bilingual adapted models achieves the highest accuracy for Japanese speakers among those using monolingual models, while maintaining the same performance level for native speakers as that of an English recognition system using an English adapted model.

Full Paper

Bibliographic reference.  Matsunaga, S. / Ogawa, A. / Yamaguchi, Yoshikazu / Imamura, A. (2003): "Speaker adaptation for non-native speakers using bilingual English lexicon and acoustic models", In EUROSPEECH-2003, 3113-3116.