Variation in the synchrony between two or more simultaneous articulatory gestures in speech may cause large variability in the acoustic signal and lower the accuracy and robustness of recognition systems. In this report, a technique is described that accounts for this effect by predicting alternative ways of pronunciation of an utterance. A formant based speech production system is used for generating the reference templates to be used for recognition. The delay between voicing transition and formant movements has been systematically varied, by the production system, forming different paths through a transition network at phoneme boundaries. In a pilot experiment, the recogniser behaviour was examined for utterances having different time position of the devoicing of phrase-final vowels.
Cite as: Blomberg, M. (1991) Modelling articulatory inter-timing variation in a speech recognition system based on synthetic references. Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991), 789-792, doi: 10.21437/Eurospeech.1991-205
@inproceedings{blomberg91_eurospeech, author={M. Blomberg}, title={{Modelling articulatory inter-timing variation in a speech recognition system based on synthetic references}}, year=1991, booktitle={Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991)}, pages={789--792}, doi={10.21437/Eurospeech.1991-205} }