14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Integrating Conditional Random Fields and Joint Multi-Gram Model with Syllabic Features for Grapheme-to-Phone Conversion

Xiaoxuan Wang, Khe Chai Sim

National University of Singapore, Singapore

In this paper, we present a hybrid system that combines the Joint Multi-gram Model (JMM) and the Conditional Random Field (CRF) classifiers to solve the Grapheme-to-Phone (G2P) conversion problem. JMM is a generative language model for the n-grams of the joint letter-phone units. JMM is able to model longer phonetic contextual information. However, it is difficult to incorporate complex features, such as syllabification structures, to JMM. On the other hand, CRFs can be used to perform G2P by formulating the task as a sequence-labeling problem. CRFs are discriminative classifiers that can incorporate complex feature functions. However, modeling in CRFs requires the alignment between the letters and phones. Furthermore, traditional linear chain CRFs usually only employ bigram output information for practical reasons, which is not sufficient for this task. In this work, JMM and CRFs are combined in tandem to yield the JMM-CRF hybrid system that benefits from both of the individual approaches. Results on the CMUDict and CELEX databases show that the proposed hybrid system consistently outperforms the individual JMM and CRF systems. Finally, syllabic features are incorporated into the CRFs as additional features and achieve further performance improvement with the hybrid system.

Full Paper

Bibliographic reference.  Wang, Xiaoxuan / Sim, Khe Chai (2013): "Integrating conditional random fields and joint multi-gram model with syllabic features for grapheme-to-phone conversion", In INTERSPEECH-2013, 2321-2325.