EUROSPEECH 2003 - INTERSPEECH 2003
In this work, we introduce several models for grapheme-to-phoneme conversion: a conditional maximum entropy model, a joint maximum entropy n-gram model, and a joint maximum entropy n-gram model with syllabification. We examine the relative merits of conditional and joint models for this task, and find that joint models have many advantages. We show that the performance of our best model, the joint n-gram model, compares favorably with the best results for English grapheme-to-phoneme conversion reported in the literature, sometimes by a wide margin. In the latter part of this paper, we consider the task of merging pronunciation lexicons expressed in different phone sets. We show that models for grapheme-to-phoneme conversion can be adapted effectively to this task.
Bibliographic reference. Chen, Stanley F. (2003): "Conditional and joint models for grapheme-to-phoneme conversion", In EUROSPEECH-2003, 2033-2036.