Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Comprehensive Modulation Representation for Automatic Speech Recognition

Yadong Wang (1), Steven Greenberg (2), Jayaganesh Swaminathan (3), Ramdas Kumaresan (3), David Poeppel (1)

(1) University of Maryland, USA; (2) Technical University of Denmark, Denmark; (3) University of Rhode Island, USA

We present a new feature representation for speech recognition based on both amplitude modulation spectra (AMS) and frequency modulation spectra (FMS). A comprehensive modulation spectral (CMS) approach is defined and analyzed based on a modulation model of the band-pass signal. The speech signal is processed first by a bank of specially designed auditory band-pass filters. CMS are extracted from the output of the filters as the features for automatic speech recognition (ASR). A significant improvement is demonstrated in performance on noisy speech. On the Aurora 2 task the new features result in an improvement of 23.43% relative to traditional mel-cepstrum front-end features using a 3 GMM HMM back-end. Although the improvements are relatively modest, the novelty of the method and its potential for performance enhancement warrants serious attention for future-generation ASR applications.

Full Paper

Bibliographic reference.  Wang, Yadong / Greenberg, Steven / Swaminathan, Jayaganesh / Kumaresan, Ramdas / Poeppel, David (2005): "Comprehensive modulation representation for automatic speech recognition", In INTERSPEECH-2005, 3025-3028.