EUROSPEECH 2003 - INTERSPEECH 2003
The pronunciation model providing the mapping from the written form of words to their pronunciations is called the text-to-phoneme (TTP) mapping. Such a mapping is commonly used in automatic speech recognition (ASR) as well as in text-to-speech (TTS) applications. Rule based TTP mappings can be derived for structured languages, such as Finnish and Japanese. Data-driven TTP mappings are usually applied for non-structured languages such as English and Danish. Artificial neural network (ANN) and decision tree (DT) approaches are commonly applied in this task. Compared to the ANN methods, the DT methods usually provide more accurate pronunciation models. The DT methods can, however, lead to a set of models with a high memory footprint if the mappings between letters and phonemes are complex. In this paper, we present a weighted entropy training method for the DT based TTP mapping. Statistical information about the vocabulary is utilized in the training process in order to optimize the TTP performance for pre-defined memory requirements. The results obtained in the simulation experiments indicate that the memory requirements of the TTP models can be significantly reduced without degrading the mapping accuracy. The applicability of the approach is also verified in the speech recognition experiments.
Bibliographic reference. Tian, Jilei / Suontausta, Janne / Hakkinen, Juha (2003): "Weighted entropy training for the decision tree based text-to-phoneme mapping", In EUROSPEECH-2003, 217-220.