5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Robust Automatic Speech Recognition by the Application of a Temporal-Correlation-Based Recurrent Multilayer Neural Network to the Mel-Based Cepstral Coefficients

Michel Heon, Hesham Tolba, Douglas O'Shaughnessy

INRS-Telecommunications, Canada

In this paper, the problem of robust speech recognition has been considered. Our approach is based on the noise reduction of the parameters that we use for recognition, that is, the Mel-based cepstral coefficients. A Temporal-Correlation-Based Recurrent Multilayer Neural Network (TCRMNN) for noise reduction in the cepstral domain is used in order to get less-variant parameters to be useful for robust recognition in noisy environments. Experiments show that the use of the enhanced parameters using such an approach increases the recognition rate of the continuous speech recognition (CSR) process. The HTK Hidden Markov Model Toolkit was used throughout. Experiments were done on a noisy version of the TIMIT database. With such a pre-processing noise reduction technique in the front-end of the HTK-based continuous speech recognition system (CSR) system, improvements in the recognition accuracy of about 17.77% and 18.58% using single mixture monophones and triphones, respectively, have been obtained at a moderate SNR of 20 dB.

Full Paper

Bibliographic reference.  Heon, Michel / Tolba, Hesham / O'Shaughnessy, Douglas (1998): "Robust automatic speech recognition by the application of a temporal-correlation-based recurrent multilayer neural network to the mel-based cepstral coefficients", In ICSLP-1998, paper 0807.