EUROSPEECH 2003 - INTERSPEECH 2003
8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003

        

Integrated Pitch and MFCC Extraction for Speech Reconstruction and Speech Recognition Applications

Xu Shao, Ben P. Milner, Stephen J. Cox

University of East Anglia, U.K.

This paper proposes an integrated speech front-end for both speech recognition and speech reconstruction applications. Speech is first decomposed into a set of frequency bands by an auditory model. The output of this is then used to extract both robust pitch estimates and MFCC vectors. Initial tests used a 128 channel auditory model, but results show that this can be reduced significantly to between 23 and 32 channels. A detailed analysis of the pitch classification accuracy and the RMS pitch error shows the system to be more robust than both comb function and LPC-based pitch extraction. Speech recognition results show that the auditory-based cepstral coefficients give very similar performance to conventional MFCCs. Spectrograms and informal listening tests also reveal that speech reconstructed from the auditory-based cepstral coefficients and pitch has similar quality to that reconstructed from conventional MFCCs and pitch.

Full Paper

Bibliographic reference.  Shao, Xu / Milner, Ben P. / Cox, Stephen J. (2003): "Integrated pitch and MFCC extraction for speech reconstruction and speech recognition applications", In EUROSPEECH-2003, 1725-1728.