INTERSPEECH 2010
11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Learning New Word Pronunciations from Spoken Examples

Ibrahim Badr, Ian McGraw, James Glass

MIT, USA

A lexicon containing explicit mappings between words and pronunciations is an integral part of most automatic speech recognizers (ASRs). While many ASR components can be trained or adapted using data, the lexicon is one of the few that typically remains static until experts make manual changes. This work takes a step towards alleviating the need for manual intervention by integrating a popular grapheme-to-phoneme conversion technique with acoustic examples to automatically learn high-quality baseform pronunciations for unknown words. We explore two models in a Bayesian framework, and discuss their individual advantages and shortcomings. We show that both are able to generate better-than-expert pronunciations with respect to word error rate on an isolated word recognition task.

Full Paper

Bibliographic reference.  Badr, Ibrahim / McGraw, Ian / Glass, James (2010): "Learning new word pronunciations from spoken examples", In INTERSPEECH-2010, 2294-2297.