Sixth International Conference on Spoken Language Processing
October 16-20, 2000
Decision Tree Based Text-To-Phoneme Mapping for Speech Recognition
Janne Suontausta, Juha Häkkinen
Speech and Audio Systems Laboratory, Nokia Research Center, Tampere, Finland
In many embedded speech recognition systems, the phonetic
transcriptions of the vocabulary items, i.e., the lexicons, cannot
be stored to the device beforehand. A text-to-phoneme mapping
functionality is hence needed to create the transcriptions from
plain text. Several approaches have been evaluated in the literature.
In this paper, a decision tree based text-to-phoneme mapping
is studied. A decision tree is trained for each letter according
to information theoretic criteria on a pronunciation dictionary
that contains the phoneme transcriptions for a large number
of words. Context information is utilized to create the mapping.
In our experiments, the mapping was constructed on the Carnegie
Mellon pronunciation dictionary . The phoneme accuracy
of the most effective mapping was 99% on the training set and
91% on the test set of the pronunciation dictionary. The mapping
was also implemented in a speaker independent isolated
word recognition system. The recognition rates in the clean and
in the car noise test environment were close to the baseline recognition
rates obtained with the correct transcriptions, when the
training lexicon contained the test vocabulary. When the test
vocabulary differed significantly from the training vocabulary,
the mapping performed below our expectations.
- Weide, R.L., Carnegie Mellon Pronouncing Dictionary,
Release 0.4, http://www.cs.cmu.edu.
Suontausta, Janne / Häkkinen, Juha (2000):
"Decision tree based text-to-phoneme mapping for speech recognition",
In ICSLP-2000, vol.2, 831-834.