5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Pronunciation Modeling for Large Vocabulary Conversational Speech Recognition

Kristine Ma, George Zavaliagkos, Rukmini Iyer

GTE/BBN Technologies, USA

In this paper, we address the issue of deriving and using more realistic pronunciations to represent words spoken in natural conversational speech. Previous approaches include using automatic phoneme-based rule-learning techniques, linguistic transformation rules, and phonetically hand-labelled corpus to expand the number of pronunciation variants per word. While rule-based approaches have the advantage of being easily extensible to infrequent or unobserved words, they suffer from the problem of over generalization. Using hand-transcribed data, one can obtain a more concise set of new pronunciations but it cannot be extended to unobserved or infrequently occuring words. In this paper, we adopt the hand-labelled corpus scheme to improve pronunciations for frequent multi and single words occurring in the training data, while using the rule-based techniques to learn pronunciation variants and their weights for the infrequent words. Furthermore, we experiment with a new approach for speaker-dependent pronunciation modeling. The newly expanded dictionaries are evaluated on the Switchboard and Callhome corpora, giving a slight reduction in word recognition error rate.

Full Paper

Bibliographic reference.  Ma, Kristine / Zavaliagkos, George / Iyer, Rukmini (1998): "Pronunciation modeling for large vocabulary conversational speech recognition", In ICSLP-1998, paper 0866.