4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
Recently, we described a two-step self-learning approach for grapheme-to-phoneme (G2P) conversion . In the first step, grapheme and phoneme strings in the training data are aligned via an iterative Viterbi procedure that may insert graphemic and phonemic nulls where required. In the second step, a Trie structure encoding pronunciation rules is generated. In this paper we describe the alignment module, and give alignment accuracies on the NETtalk database. We also compare transcription accuracies for two approaches to the second step on three databases: the NETtalk database, the CMU dictionary and the French part of the ONOMASTICA lexicon. The two transcription approaches applied in this research are a Trie approach  and an approach based on binary decision trees grown by means of the Gelfand-Ravishankar-Delp algorithm [2,3,4]. We discuss the choice of questions for these decision trees - it may be possible to formulate questions about groups of characters (e.g., "is the next letter a vowel?") that yield better trees than those that only use questions about individual characters (e.g., "is the next letter an ‘A’ ?"). Finally, we discuss the implications of our work for G2P conversion.
Bibliographic reference. Andersen, Ove / Kuhn, Roland / Lazaridès, Ariane / Dalsgaard, Paul / Haas, Jürgen / Nöth, Elmar (1996): "Comparison of two tree-structured approaches for grapheme-to-phoneme conversion", In ICSLP-1996, 1700-1703.