7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Reducing Pronunciation Lexicon Confusion and Using More Data without Phonetic Transcription for Pronunciation Modeling

Fang Zheng (1), Zhanjiang Song (1), Pascale Fung (2), William Byrne (3)

(1) Beijing d-Ear Technologies Co. Ltd., China; (2) Hong Kong University of Science and Technology, China; (3) Johns Hopkins University, USA

The multiple-pronunciation lexicon (MPL) is very important to model the pronunciation variations for spontaneous speech recognition. But the introduction of MPL brings out two problems. First, the MPL will increase the among-lexicon confusion and degrade the recognizer’s performance. Second, the MPL needs more data with phonetic transcription so as to cover as many surface forms as possible. Accordingly, two solutions are proposed, they are the context-dependent weighting method and the iterative forced-alignment based transcription method. The use of them can compensate what the MPL causes and improve the overall performance. Experiments across a naturally spontaneous speech database show that the proposed methods are effective and better than other methods. A Parzen Window Based Derivation of Minimum

Full Paper

Bibliographic reference.  Zheng, Fang / Song, Zhanjiang / Fung, Pascale / Byrne, William (2002): "Reducing pronunciation lexicon confusion and using more data without phonetic transcription for pronunciation modeling", In ICSLP-2002, 2461-2464.