Accurate grapheme-to-phoneme (g2p) conversion is needed for several speech processing applications, such as automatic speech synthesis and recognition. For some languages, notably English, improvements of g2p systems are very slow, due to the intricacy of the associations between letter and sounds. In recent years, several improvements have been obtained either by using variable-length associations in generative models (joint-n-grams), or by recasting the problem as a conventional sequence labeling task, enabling to integrate rich dependencies in discriminative models. In this paper, we consider several ways to reconciliate these two approaches. Introducing hidden variable-length alignments through latent variables, our Hidden Conditional Random Field (HCRF) models are able to produce comparative performance compared to strong generative and discriminative models on the CELEX database.
Bibliographic reference. Lehnen, Patrick / Allauzen, Alexandre / Lavergne, Thomas / Yvon, François / Hahn, Stefan / Ney, Hermann (2013): "Structure learning in hidden conditional random fields for grapheme-to-phoneme conversion", In INTERSPEECH-2013, 2326-2330.