EUROSPEECH 2003 - INTERSPEECH 2003
We describe an algorithm to learn word pronunciations from acoustic data. The algorithm jointly optimizes the pronunciation of a word using (a) the acoustic match of this pronunciation to the observed data, and (b) how "linguistically reasonable" the pronunciation is. Variations of word pronunciations in the recognition dictionary (which was created by linguists), are used to train a model of whether new hypothesized pronunciations are reasonable or not. The algorithm is well-suited for proper name pronunciation learning. Experiments on a corporate name dialing database show 40% error rate reduction with respect to a letter-to-phone pronunciation engine.
Bibliographic reference. Beaufays, Francoise / Sankar, Ananth / Williams, Shaun / Weintraub, Mitch (2003): "Learning linguistically valid pronunciations from acoustic data", In EUROSPEECH-2003, 2593-2596.