EUROSPEECH 2003 - INTERSPEECH 2003
In this work we propose a time domain technique to estimate an all-pole model based on the minimum variance distortionless response (MVDR) using a warped short time frequency axis such as the Mel scale. The use of the MVDR eliminates the overemphasis of harmonic peaks typically seen in medium and high pitched voiced speech when spectral estimation is based on linear prediction (LP). Moreover, warping the frequency axis prior to MVDR spectral estimation ensures more parameters in the spectral model are allocated to the low, as opposed to high, frequency regions of the spectrum, thereby mimicking the human auditory system. In a series of speech recognition experiments on the Switchboard Corpus (spontaneous English telephone speech), the proposed approach achieved a word error rate (WER) of 32.1% for female speakers, which is clearly superior to the 33.2% WER obtained by the usual combination of Mel warping and linear prediction.
Bibliographic reference. Wolfel, Matthias / McDonough, John / Waibel, Alex (2003): "Minimum variance distortionless response on a warped frequency scale", In EUROSPEECH-2003, 1021-1024.