This paper describes a reversible letter-tosound/sound-toletter system based on a strategy that combines data-driven techniques with a rule-based formalism. Our approach is to provide a hierarchical analysis of a word, including information such as stress pat- tern, morphology and syllabification, which incorporates probabilities that are trained from a parsed lexicon. Our training and testing corpora consisted of spellings and pronunciations for the high frequency portion of the Brown Corpus (10,000 words). We augmented the phonetic labels with markers indicating morphology and stress. We report here on two distinct grammars representing a historical perspective. Our early work with the first grammar inspired us to modify the grammar formalism, leading to greater constraint with fewer rules. We evaluated our performance on letter-to-sound generation in terms of whole word accuracy as well as phoneme accuracy. For the unseen test set, we achieved a word accuracy of 69.3% and a phone accuracy of 91.7% using a set of 49 distinct phonemes. Although we have no formal results on sound-to-letter generation, we believe that this formalism will be applicable for entering unknown words orally into a recognition system.
Bibliographic reference. Hunnicutt, Sheri / Meng, Helen / Seneff, Stephanie / Zue, Victor W. (1993): "Reversible letter-to-sound sound-to-letter generation based on parsing word morphology", In EUROSPEECH'93, 763-766.