In virtually every state-of-the-art large vocabulary continuous speech recognition (LVCSR) system, grapheme-to-phoneme (G2P) conversion is applied to generalize beyond a fixed set of words given by a background lexicon. The overall performance of the G2P system has a strong effect on the recognition quality. Typically, generative models based on joint-n-grams are used, although some discriminative models have a competitive performance but the training time may be quite large. In this work, the effect of using discriminative G2P modeling based on hidden conditional random fields (HCRFs) is analyzed. Besides measuring and comparing the G2P qualities on a textual level, one focus is the performance of LVCSR systems. Although the HCRF model does not outperform the generative one on text data, we could improve our English QUAERO ASR system by 1.3% relative on a couple of test corpora over a strong baseline by only replacing the G2P strategy.
Bibliographic reference. Hahn, Stefan / Lehnen, Patrick / Wiesler, Simon / Schlüter, Ralf / Ney, Hermann (2013): "Improving LVCSR with hidden conditional random fields for grapheme-to-phoneme conversion", In INTERSPEECH-2013, 495-499.