COST278 and ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction

University of East Anglia, Norwich, UK
August 30-31, 2004

On De-Emphasizing the Spurious Components in the Spectral Modulation for Robust Speech Recognition

Vivek Tyagi, Christian Wellekens

Institute Eurecom, Sophia Antipolis, France

It is well known that the peaks in log Mel-filter bank spectrum essentially represent the "formants" of the speech signal and are important cues in characterizing the sound. However, the perturbations in the low energy log Mel-filter bank spectrum create unnecessary sensitivity in the cepstral comparison, especially in the presence of the additive noise. In this paper, we present a technique to suppress this unnecessary sensitivity of the log Mel-filter bank spectrum (logMelFBS) of the speech signals, while preserving the fundamental formant structure. From the practical point of view, our technique is quite similar to the spectral root homomorphic deconvolution systems (SRDS) [2]. However, we work with log homomorphic deconvolution system (LHDS) [1] and use an exponentiation of logMelFBS to emphasize the spectral peaks (formants). In experiments with speech signals, it is shown that the proposed technique based features yield a significant increase in speech recognition performance in non-stationary noise conditions when compared directly to the MFCC features, while achieving slightly better performance in clean conditions. The proposed technique yields almost similar performance as compared to the root Mel-cepstral coefficients (RMFCC) in the noisy as well as clean conditions.

References

  1. A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing, pp. 771-772, Prentice-Hall, N.J., USA, 1989.
  2. J. S. Lim, "Spectral Root Homomorphic Deconvolution system," IEEE Trans. on ASSP, Vol. ASSP-27, No. 3, June 1979.


Full Paper

Bibliographic reference.  Tyagi, Vivek / Wellekens, Christian (2004): "On de-emphasizing the spurious components in the spectral modulation for robust speech recognition", In Robust2004, paper 24.