ISCA Archive PMLA 2002
ISCA Archive PMLA 2002

Improved pronunciation modeling by properly integrating better approaches for baseform generation, ranking and pruning

Ming-yi Tsai, Fu-chiang Chou, Lin-shan Lee

In this paper, a complete framework for pronunciation modeling process is discussed and analyzed as the integration of three individual but mutual-interactive stages, i.e., the baseform generation, baseform ranking, and baseform pruning stage. The characteristics of different techniques used in each stage and the interaction among them are then well reflected on the overall performance of pronunciation modeling. Consequently, pronunciation variation could be better handled by integrating the appropriately chosen techniques for each of the three stages.

In baseform generation stage, an improved approach to automatically construct a fine word confusion table with an expanded phone-unit dictionary is proposed. In baseform ranking stage, the conventional pronunciation frequency (pf)- based, our recently proposed pronunciation frequency and inverse word frequency (pf-iwf)-based, and a newly proposed iterative pf-iwf-based ranking strategies were all evaluated and analyzed. And then the nice property of pf-iwf-based strategies was verified and discussed. In baseform pruning stage, both traditional probability-based, count -based and our recently proposed entropy-based pruning criteria were investigated. Integrated with better approaches in the other two stages, say a fine confusion table and pf-iwf-based ranking, the superiority of entropy-based pruning to probability-based and count-based pruning approach was revealed and the best performance was achieved. The experiments also indicate that the evaluation of different approaches used in one stage may be shadowed by the inappropriate approaches used in other stages. For newly proposed iterative pf-iwf-based ranking, on the other hand, only marginal improvements compared to recently proposed pf-iwfbased was observed. Very probably we’ll need a larger corpus to verify more complete behavior of pronunciation modeling for this situation. Further investigation is under progress.

In addition, the interaction between pronunciation modeling and language modeling is discussed on some test results with a cheating language model trained on the test data only. The potential improvement achievable with pronunciation modeling with this cheating language model is even more significant as compared to those with a fair language model. This indicated that there is still much room for improvements if the interaction between better language modeling and better pronunciation modeling can be further investigated.


Cite as: Tsai, M.-y., Chou, F.-c., Lee, L.-s. (2002) Improved pronunciation modeling by properly integrating better approaches for baseform generation, ranking and pruning. Proc. ITRW on Pronunciation Modeling and Lexicon Adaptation for Spoken Language Technology (PMLA 2002), 77-82

@inproceedings{tsai02_pmla,
  author={Ming-yi Tsai and Fu-chiang Chou and Lin-shan Lee},
  title={{Improved pronunciation modeling by properly integrating better approaches for baseform generation, ranking and pruning}},
  year=2002,
  booktitle={Proc. ITRW on Pronunciation Modeling and Lexicon Adaptation for Spoken Language Technology (PMLA 2002)},
  pages={77--82}
}