8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Combining Linguistic Knowledge and Acoustic Information in Automatic Pronunciation Lexicon Generation

Grace Chung, Chao Wang, Stephanie Seneff, Ed Filisko, Min Tang


This paper describes several experiments aimed at the long term goal of enabling a conversational interface to automatically improve its pronunciation lexicon over time through direct interactions with end users and from available Web sources. We selected a set of 200 rare words from the OGI corpus of spoken names, and performed several experiments combining spelling and pronunciation information to hypothesize phonemic baseforms for these words. We evaluated the quality of the resulting baseforms through a series of recognition experiments, using the 200 words in an isolated word recognition task. Also reported is a modification to the letter-to-sound system, utilizing a letter-phoneme n-gram language model, either alone or in combination with the original "column-bigram" model, for additional linguistic constraint. The experiments confirm that acoustic information drawn from spoken examples of the words can greatly improve the quality of the baseforms, as measured by the recognition error rate.

Full Paper

Bibliographic reference.  Chung, Grace / Wang, Chao / Seneff, Stephanie / Filisko, Ed / Tang, Min (2004): "Combining linguistic knowledge and acoustic information in automatic pronunciation lexicon generation", In INTERSPEECH-2004, 1457-1460.