Sixth European Conference on Speech Communication and Technology
The procedure of calculating Mel Frequency based Cepstral Coefficients (MFCC) is shown to resemble a three layer Multilayer Perceptron (MLP) like structure. Such an MLP is employed as a preprocessor in a hybrid HMM-MLP system, and the possibility of optimizing the whole system as a single entity, with respect to a suitable criterion, is pointed out. This system, to-gether with the Maximum Mutual Information (MMI) criterion was tested on a speaker independent, five broad class, isolated phoneme recognition task. Results of these preliminary experi-ments, which clearly indicate the advantage of optimizable pre-processing, are reported.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Warakagoda, Narada D. / Johnsen, Magne H. (1999): "Neural network based optimal feature extraction for ASR", In EUROSPEECH'99, 97-100.