The procedure of calculating Mel Frequency based Cepstral Coefficients (MFCC) is shown to resemble a three layer Multilayer Perceptron (MLP) like structure. Such an MLP is employed as a preprocessor in a hybrid HMM-MLP system, and the possibility of optimizing the whole system as a single entity, with respect to a suitable criterion, is pointed out. This system, to-gether with the Maximum Mutual Information (MMI) criterion was tested on a speaker independent, five broad class, isolated phoneme recognition task. Results of these preliminary experi-ments, which clearly indicate the advantage of optimizable pre-processing, are reported.
Cite as: Warakagoda, N.D., Johnsen, M.H. (1999) Neural network based optimal feature extraction for ASR. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 97-100, doi: 10.21437/Eurospeech.1999-28
@inproceedings{warakagoda99_eurospeech, author={Narada D. Warakagoda and Magne H. Johnsen}, title={{Neural network based optimal feature extraction for ASR}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={97--100}, doi={10.21437/Eurospeech.1999-28} }