Sixth International Conference on Spoken Language Processing
We describe a speech recognition system which uses articulatory parameters as basic features and phone-dependent linear dynamic models. The system first estimates articulatory trajectories from the speech signal. Estimations of x and y coordinates of 7 actual articulator positions in the midsagittal plane are produced every 2 milliseconds by a recurrent neural network, trained on real articulatory data. The output of this network is then passed to a set of linear dynamic models, which perform phone recognition.
Bibliographic reference. Frankel, Joe / Richmond, Korin / King, Simon / Taylor, Paul (2000): "An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces", In ICSLP-2000, vol.4, 254-257.