Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Decision Tree Based Text-To-Phoneme Mapping for Speech Recognition

Janne Suontausta, Juha Häkkinen

Speech and Audio Systems Laboratory, Nokia Research Center, Tampere, Finland

In many embedded speech recognition systems, the phonetic transcriptions of the vocabulary items, i.e., the lexicons, cannot be stored to the device beforehand. A text-to-phoneme mapping functionality is hence needed to create the transcriptions from plain text. Several approaches have been evaluated in the literature. In this paper, a decision tree based text-to-phoneme mapping is studied. A decision tree is trained for each letter according to information theoretic criteria on a pronunciation dictionary that contains the phoneme transcriptions for a large number of words. Context information is utilized to create the mapping. In our experiments, the mapping was constructed on the Carnegie Mellon pronunciation dictionary [1]. The phoneme accuracy of the most effective mapping was 99% on the training set and 91% on the test set of the pronunciation dictionary. The mapping was also implemented in a speaker independent isolated word recognition system. The recognition rates in the clean and in the car noise test environment were close to the baseline recognition rates obtained with the correct transcriptions, when the training lexicon contained the test vocabulary. When the test vocabulary differed significantly from the training vocabulary, the mapping performed below our expectations.

Reference

  1. Weide, R.L., Carnegie Mellon Pronouncing Dictionary, Release 0.4, http://www.cs.cmu.edu.


Full Paper

Bibliographic reference.  Suontausta, Janne / Häkkinen, Juha (2000): "Decision tree based text-to-phoneme mapping for speech recognition", In ICSLP-2000, vol.2, 831-834.