8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Modeling Pronunciation Variation using Artificial Neural Networks for English Spontaneous Speech

Ken Chen, Mark Hasegawa-Johnson

University of Illinois at Urbana-Champaign, USA

Pronunciation variation in conversational speech has caused significant amount of word errors in large vocabulary automatic speech recognition. Rule-based approaches and decision-tree based approaches have been previously proposed to model pronunciation variation. In this paper, we report our work on modeling pronunciation variation using artificial neural networks (ANN). The results we achieved are significantly better than previously published ones on two different corpora, indicating that ANN may be better suited for modeling pronunciation variation than other statistical models that have been previously investigated. Our experiments indicate that binary distinctive features can be used to effectively represent the phonological context. We also find that including pitch accent feature in input improves the prediction of pronunciation variation on a ToBI-labeled subset of the Switchboard corpus.

Full Paper

Bibliographic reference.  Chen, Ken / Hasegawa-Johnson, Mark (2004): "Modeling pronunciation variation using artificial neural networks for English spontaneous speech", In INTERSPEECH-2004, 1461-1464.