EUROSPEECH 2003 - INTERSPEECH 2003
This paper describes an accurate feature representation for continuous clean speech recognition. The main components of the technique involve performing a moderate order Linear Predictive (LP) analysis and computing the Minimum Variance Distortionless Response (MVDR) spectrum from these LP coefficients. This feature representation, PMCCs, was earlier shown to yield superior performance over MFCCs for different noise conditions with emphasis on car noise . The performance improvement was then attributed to better spectrum and envelope modeling properties of the MVDR methodology. This study shows that the representation is also quite efficient for clean speech recognition. In fact, PMCCs are shown to be a more accurate envelope representation and reduce speaker variability. This, in turn, yields a 12.8% relative word error rate (WER) reduction on the combination of Wall Street Journal (WSJ) Nov'92 dev/eval sets with respect to the MFCCs. Accurate envelope modeling and reduction in the speaker variability also lead to faster decoding, based on efficient pruning in the search stage. The total gain in the decoding speed is 22.4%, relative to the standard MFCC features. It is also shown that PMCCs are not very demanding in terms of computation when compared to MFCCs. Therefore, we conclude that PMCC feature extraction scheme is a better representation of clean speech as well as noisy speech than MFCC scheme.
Bibliographic reference. Yapanel, Umit H. / Dharanipragada, Satya / Hansen, John H.L. (2003): "Perceptual MVDR-based cepstral coefficients (PMCCs) for high accuracy speech recognition", In EUROSPEECH-2003, 1829-1832.