8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


On Factorizing Spectral Dynamics for Robust Speech Recognition

Vivek Tyagi, Iain A. McCowan, Hervé Bourlard, Hemant Misra

IDIAP, Switzerland

In this paper, we introduce new dynamic speech features based on the modulation spectrum. These features, termed Mel-cepstrum Modulation Spectrum (MCMS), map the time trajectories of the spectral dynamics into a series of slow and fast moving orthogonal components, providing a more general and discriminative range of dynamic features than traditional delta and acceleration features. The features can be seen as the outputs of an array of band-pass filters spread over the cepstral modulation frequency range of interest. In experiments, it is shown that, as well as providing a slight improvement in clean conditions, these new dynamic features yield a significant increase in speech recognition performance in various noise conditions when compared directly to the standard temporal derivative features and RASTA-PLP features.

Full Paper

Bibliographic reference.  Tyagi, Vivek / McCowan, Iain A. / Bourlard, Hervé / Misra, Hemant (2003): "On factorizing spectral dynamics for robust speech recognition", In EUROSPEECH-2003, 981-984.