Speech Recognition and Intrinsic Variation (SRIV2006)

Toulouse, France
May 20, 2006

Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition

Konstantin Markov, Satoshi Nakamura

ATR Spoken Language Communication Research Labs., Kyoto, Japan

In this paper, we present a study on robust speech recognition with respect to accent variations. Differences that characterize accents in speech can be divided into two parts: phonetic and acoustic. We focus on the acoustic differences and the ways of acoustic model design and training that can be used to minimize the effect of accent variations on the speech recognition system's performance. When accented training data is available, a typical approach is to train an acoustic model for each accent and use them in parallel. Another way is to pool all data together and train one model with more parameters assuming that accent variations can be learned by the training algorithm. We compared both of these approaches with a method based on the hybrid HMM/Bayesian Network (HMM/BN) framework using a database consisting of speech from the three major accents of English: American, British and Australian. The results of our experiments show that in the matched accent case, the accent dependent acoustic models perform the best. However, if the accent is unknown, for models with a small number of parameters, the pooled data training approach is preferable. In contrast, when the amount of data allows for training models with a relatively large parameter number, the HMM/BN model is the best choice.

Full Paper
Presentation (.ppt)

Bibliographic reference.  Markov, Konstantin / Nakamura, Satoshi (2006): "Acoustic modeling of accented English speech for large-vocabulary speech recognition", In SRIV-2006, 113-118.