We describe a speech recognition system which uses articulatory parameters as basic features and phone-dependent linear dynamic models. The system first estimates articulatory trajectories from the speech signal. Estimations of x and y coordinates of 7 actual articulator positions in the midsagittal plane are produced every 2 milliseconds by a recurrent neural network, trained on real articulatory data. The output of this network is then passed to a set of linear dynamic models, which perform phone recognition.
Cite as: Frankel, J., Richmond, K., King, S., Taylor, P. (2000) An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 4, 254-257
@inproceedings{frankel00_icslp, author={Joe Frankel and Korin Richmond and Simon King and Paul Taylor}, title={{An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 4, 254-257} }