14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Structure Learning in Hidden Conditional Random Fields for Grapheme-to-Phoneme Conversion

Patrick Lehnen (1), Alexandre Allauzen (2), Thomas Lavergne (2), François Yvon (2), Stefan Hahn (1), Hermann Ney (1)

(1) RWTH Aachen University, Germany
(2) LIMSI, France

Accurate grapheme-to-phoneme (g2p) conversion is needed for several speech processing applications, such as automatic speech synthesis and recognition. For some languages, notably English, improvements of g2p systems are very slow, due to the intricacy of the associations between letter and sounds. In recent years, several improvements have been obtained either by using variable-length associations in generative models (joint-n-grams), or by recasting the problem as a conventional sequence labeling task, enabling to integrate rich dependencies in discriminative models. In this paper, we consider several ways to reconciliate these two approaches. Introducing hidden variable-length alignments through latent variables, our Hidden Conditional Random Field (HCRF) models are able to produce comparative performance compared to strong generative and discriminative models on the CELEX database.

Full Paper

Bibliographic reference.  Lehnen, Patrick / Allauzen, Alexandre / Lavergne, Thomas / Yvon, François / Hahn, Stefan / Ney, Hermann (2013): "Structure learning in hidden conditional random fields for grapheme-to-phoneme conversion", In INTERSPEECH-2013, 2326-2330.