EUROSPEECH 2001 Scandinavia
In this paper, a study is made on selecting existing acoustic models that are trained from native English speech for improving recognition of non-native English talkersí speech. The problem is addressed from the perspective that foreign accents prevent detailed tri-phone models that are commonly used in high-performance speech recognition systems to match well with these talkersí speech, and therefore an appropriate level of context-dependent acoustic modeling is needed for foreign accent speakers. In this work, model complexity selection is accomplished by empirically choosing a set of model tying thresholds and by using the principle of MDL. An experiment was performed on the Wall Street Journal task on three nonnative English talkers with Chinese accent (276 sentences). Compared to the result obtained from using the models optimized to native English speakers, the best model tying threshold and MDL yielded similar and significant reduction to recognition word errors by 23%.
Bibliographic reference. He, Xiaodong / Zhao, Yunxin (2001): "Model complexity optimization for nonnative English speakers", In EUROSPEECH-2001, 1461-1464.