A lexicon containing explicit mappings between words and pronunciations is an integral part of most automatic speech recognizers (ASRs). While many ASR components can be trained or adapted using data, the lexicon is one of the few that typically remains static until experts make manual changes. This work takes a step towards alleviating the need for manual intervention by integrating a popular grapheme-to-phoneme conversion technique with acoustic examples to automatically learn high-quality baseform pronunciations for unknown words. We explore two models in a Bayesian framework, and discuss their individual advantages and shortcomings. We show that both are able to generate better-than-expert pronunciations with respect to word error rate on an isolated word recognition task.
Bibliographic reference. Badr, Ibrahim / McGraw, Ian / Glass, James (2010): "Learning new word pronunciations from spoken examples", In INTERSPEECH-2010, 2294-2297.