14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Improving LVCSR with Hidden Conditional Random Fields for Grapheme-to-Phoneme Conversion

Stefan Hahn, Patrick Lehnen, Simon Wiesler, Ralf Schlüter, Hermann Ney

RWTH Aachen University, Germany

In virtually every state-of-the-art large vocabulary continuous speech recognition (LVCSR) system, grapheme-to-phoneme (G2P) conversion is applied to generalize beyond a fixed set of words given by a background lexicon. The overall performance of the G2P system has a strong effect on the recognition quality. Typically, generative models based on joint-n-grams are used, although some discriminative models have a competitive performance but the training time may be quite large. In this work, the effect of using discriminative G2P modeling based on hidden conditional random fields (HCRFs) is analyzed. Besides measuring and comparing the G2P qualities on a textual level, one focus is the performance of LVCSR systems. Although the HCRF model does not outperform the generative one on text data, we could improve our English QUAERO ASR system by 1.3% relative on a couple of test corpora over a strong baseline by only replacing the G2P strategy.

Full Paper

Bibliographic reference.  Hahn, Stefan / Lehnen, Patrick / Wiesler, Simon / Schlüter, Ralf / Ney, Hermann (2013): "Improving LVCSR with hidden conditional random fields for grapheme-to-phoneme conversion", In INTERSPEECH-2013, 495-499.