To improve the performance and the usability of the speech recognition devices, It is necessary for most applications to allow users to enter new words or personalize words to the system vocabulary. Voice-tagging technique is a simple example that use speaker dependent spoken sample to generate baseform transcriptions of the spoken words. More sophisticated techniques can use both spoken samples and texts of the new words to generate baseform transcriptions. In this paper, we propose a new approach to the problem. We use Bayesian networks to model the letter-to-sound rule probabilities. Compared to the common decision tree based method, This new approach shows a definite advantage.
Cite as: Ma, C., Randolph, M.A. (2001) An approach to automatic phonetic baseform generation based on Bayesian networks. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 1453-1457, doi: 10.21437/Eurospeech.2001-27
@inproceedings{ma01_eurospeech, author={Changxue Ma and Mark A. Randolph}, title={{An approach to automatic phonetic baseform generation based on Bayesian networks}}, year=2001, booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)}, pages={1453--1457}, doi={10.21437/Eurospeech.2001-27} }