8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Conditional and Joint Models for Grapheme-to-Phoneme Conversion

Stanley F. Chen

IBM T.J. Watson Research Center, USA

In this work, we introduce several models for grapheme-to-phoneme conversion: a conditional maximum entropy model, a joint maximum entropy n-gram model, and a joint maximum entropy n-gram model with syllabification. We examine the relative merits of conditional and joint models for this task, and find that joint models have many advantages. We show that the performance of our best model, the joint n-gram model, compares favorably with the best results for English grapheme-to-phoneme conversion reported in the literature, sometimes by a wide margin. In the latter part of this paper, we consider the task of merging pronunciation lexicons expressed in different phone sets. We show that models for grapheme-to-phoneme conversion can be adapted effectively to this task.

Full Paper

Bibliographic reference.  Chen, Stanley F. (2003): "Conditional and joint models for grapheme-to-phoneme conversion", In EUROSPEECH-2003, 2033-2036.