EUROSPEECH 2003 - INTERSPEECH 2003
This paper proposes a supervised speaker adaptation method that is effective for both non-native (i.e. Japanese) and native English speakers' pronunciation of English speech. This method uses English and Japanese phoneme acoustic models and a pronunciation lexicon in which each word has both English and Japanese phoneme transcriptions. The same utterances are used for adaptation of both acoustic models. A recognition system uses these two adapted acoustic models and the lexicon, and the highest-likelihood word sequence obtained in combining with English- and Japanese-pronounced words is the recognition result. Continuous speech recognition experiments show that the proposed adaptation method greatly improves both Japanese-English and native- English recognition performance, and the system using bilingual adapted models achieves the highest accuracy for Japanese speakers among those using monolingual models, while maintaining the same performance level for native speakers as that of an English recognition system using an English adapted model.
Bibliographic reference. Matsunaga, S. / Ogawa, A. / Yamaguchi, Yoshikazu / Imamura, A. (2003): "Speaker adaptation for non-native speakers using bilingual English lexicon and acoustic models", In EUROSPEECH-2003, 3113-3116.