4th International Conference on Spoken Language Processing

Philadelphia, PA, USA
October 3-6, 1996

Comparison of Two Tree-Structured Approaches for Grapheme-to-Phoneme Conversion

Ove Andersen (1), Roland Kuhn (2), Ariane Lazaridès (2), Paul Dalsgaard (2), Jürgen Haas (3), Elmar Nöth (3)

(1) Center for PersonKommunikation, Aalborg University, Denmark
(2) Centre de recherche informatique de Montréal, Canada
(3) University of Erlangen-Nürnberg, Germany

Recently, we described a two-step self-learning approach for grapheme-to-phoneme (G2P) conversion [1]. In the first step, grapheme and phoneme strings in the training data are aligned via an iterative Viterbi procedure that may insert graphemic and phonemic nulls where required. In the second step, a Trie structure encoding pronunciation rules is generated. In this paper we describe the alignment module, and give alignment accuracies on the NETtalk database. We also compare transcription accuracies for two approaches to the second step on three databases: the NETtalk database, the CMU dictionary and the French part of the ONOMASTICA lexicon. The two transcription approaches applied in this research are a Trie approach [1] and an approach based on binary decision trees grown by means of the Gelfand-Ravishankar-Delp algorithm [2,3,4]. We discuss the choice of questions for these decision trees - it may be possible to formulate questions about groups of characters (e.g., "is the next letter a vowel?") that yield better trees than those that only use questions about individual characters (e.g., "is the next letter an ‘A’ ?"). Finally, we discuss the implications of our work for G2P conversion.

Full Paper

Bibliographic reference.  Andersen, Ove / Kuhn, Roland / Lazaridès, Ariane / Dalsgaard, Paul / Haas, Jürgen / Nöth, Elmar (1996): "Comparison of two tree-structured approaches for grapheme-to-phoneme conversion", In ICSLP-1996, 1700-1703.