Sixth European Conference on Speech Communication and Technology
(EUROSPEECH'99)

Budapest, Hungary
September 5-9, 1999

Neural Network Based Optimal Feature Extraction for ASR

Narada D. Warakagoda, Magne H. Johnsen

Department of Telecommunications, Signal Processing group NTNU, Trondheim, Norway

The procedure of calculating Mel Frequency based Cepstral Coefficients (MFCC) is shown to resemble a three layer Multilayer Perceptron (MLP) like structure. Such an MLP is employed as a preprocessor in a hybrid HMM-MLP system, and the possibility of optimizing the whole system as a single entity, with respect to a suitable criterion, is pointed out. This system, to-gether with the Maximum Mutual Information (MMI) criterion was tested on a speaker independent, five broad class, isolated phoneme recognition task. Results of these preliminary experi-ments, which clearly indicate the advantage of optimizable pre-processing, are reported.


Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Warakagoda, Narada D. / Johnsen, Magne H. (1999): "Neural network based optimal feature extraction for ASR", In EUROSPEECH'99, 97-100.