This work introduces a modified WFST-based multiple to multiple EM-driven alignment algorithm for Grapheme-to-Phoneme (G2P) conversion, and preliminary experimental results applying a Recurrent Neural Network Language Model (RNNLM) as an N-best rescoring mechanism for G2P conversion. The alignment algorithm leverages the WFST framework and introduces several simple structural constraints which yield a small but consistent improvement in Word Accuracy (WA) on a selection of standard baselines. The RNNLM rescoring further extends these gains and achieves state-of-the-art performance on four standard G2P datasets. The system is also shown to be significantly faster than existing solutions. Finally, the complete WFST-based G2P framework is provided as an open-source toolkit.
Index Terms: G2P, Alignment, RNNLM, WFST
Bibliographic reference. Novak, Josef R. / Dixon, Paul R. / Minematsu, Nobuaki / Hirose, Keikichi / Hori, Chiori / Kashioka, Hideki (2012): "Improving WFST-based G2p conversion with alignment constraints and RNNLM n-best rescoring", In INTERSPEECH-2012, 2526-2529.