In this paper a method of preparing a word transcription dictionary from utterances of training sentences is described. The word transcriptions are first extracted from the acoustic-phonetic labels of the training sentences, and then compressed using a cost measure of the joint word likelihood. The decoding performance from using this dictionary in a newly developed speaker-independent continuous speech recognition system is compared with the performance obtained by using a standard dictionary. The experimental results indicate that dictionaries prepared from sentence utterances can reflect the reduced pronunciation and contextual effect in continuous speech, and yield better decoding accuracy than word transcriptions in the forms of isolated utterances. On a task of an 853 word vocabulary and a test set grammar perplexity of 104, the system achieved a word accuracy of 85. 3% which represents an error reduction of 23% compared with the decoding results from using a standard dictionary.
Cite as: Zhao, Y., Wakita, H., Zhuang, X. (1991) Generate word transcription dictionary from sentence utterances and evaluate its effect on speaker-independent continuous speech recognition. Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991), 679-682, doi: 10.21437/Eurospeech.1991-167
@inproceedings{zhao91_eurospeech, author={Yunxin Zhao and Hisashi Wakita and Xinhua Zhuang}, title={{Generate word transcription dictionary from sentence utterances and evaluate its effect on speaker-independent continuous speech recognition}}, year=1991, booktitle={Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991)}, pages={679--682}, doi={10.21437/Eurospeech.1991-167} }