Grapheme-to-Phoneme conversion (G2P) is usually used within every state-of-the-art ASR
system to generalize beyond a fixed set of words. Although
the performance is typically already quite good (<10% phoneme error rate) and
pronunciations of important words are checked by a linguist, further
improvements are still desirable, especially for end user customization.
In this work, we present and compare five methods/tools to tackle the G2P task. Although most of the methods have already been published and/or are available as open source software, the reported experiments are done on large state-of-the-art tasks and the used software is from the actual publications.
Besides an experimental comparison on text data for a range of languages (i.e. measuring the G2P accuracy only), our focus in this paper is measuring the effect of improved G2P modeling on LVCSR performance for a challenging ASR task. Additionally, the effect of using n-Best pronunciation variants instead of single best is investigated briefly.
Index Terms: grapheme-to-phoneme conversion, G2P, ASR
Bibliographic reference. Hahn, Stefan / Vozila, Paul / Bisani, Maximilian (2012): "Comparison of grapheme-to-phoneme methods on large pronunciation dictionaries and LVCSR tasks", In INTERSPEECH-2012, 2538-2541.