ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Decision tree based text-to-phoneme mapping for speech recognition

Janne Suontausta, Juha Häkkinen

In many embedded speech recognition systems, the phonetic transcriptions of the vocabulary items, i.e., the lexicons, cannot be stored to the device beforehand. A text-to-phoneme mapping functionality is hence needed to create the transcriptions from plain text. Several approaches have been evaluated in the literature. In this paper, a decision tree based text-to-phoneme mapping is studied. A decision tree is trained for each letter according to information theoretic criteria on a pronunciation dictionary that contains the phoneme transcriptions for a large number of words. Context information is utilized to create the mapping. In our experiments, the mapping was constructed on the Carnegie Mellon pronunciation dictionary [1]. The phoneme accuracy of the most effective mapping was 99% on the training set and 91% on the test set of the pronunciation dictionary. The mapping was also implemented in a speaker independent isolated word recognition system. The recognition rates in the clean and in the car noise test environment were close to the baseline recognition rates obtained with the correct transcriptions, when the training lexicon contained the test vocabulary. When the test vocabulary differed significantly from the training vocabulary, the mapping performed below our expectations.

Weide, R.L., Carnegie Mellon Pronouncing Dictionary, Release 0.4,

doi: 10.21437/ICSLP.2000-398

Cite as: Suontausta, J., Häkkinen, J. (2000) Decision tree based text-to-phoneme mapping for speech recognition. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 2, 831-834, doi: 10.21437/ICSLP.2000-398

  author={Janne Suontausta and Juha Häkkinen},
  title={{Decision tree based text-to-phoneme mapping for speech recognition}},
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 2, 831-834},