10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Subband Temporal Modulation Spectrum Normalization for Automatic Speech Recognition in Reverberant Environments

Xugang Lu (1), Masashi Unoki (2), Satoshi Nakamura (1)

(1) NICT, Japan
(2) JAIST, Japan

Speech recognition in reverberant environments is still a challenge problem. In this paper, we first investigated the reverberation effect on subband temporal envelopes by using the modulation transfer function (MTF). Based on the investigation, we proposed an algorithm which normalizes the subband temporal modulation spectrum (TMS) to reduce the diffusion effect of the reverberation. During the normalization, both the subband TMS of the clean and reverberated speech are normalized to a reference TMS calculated from a clean speech data set for each frequency subband. Based on the normalized subband TMS, the inverse Fourier transform was done to restore the subband temporal envelopes by keeping their original phase information. We tested our algorithm on reverberated speech recognition tasks (in a reverberant room). For comparison, the traditional Mel-frequency cepstral coefficient (MFCC) and relative spectral filtering (RASTA) were used. Experimental results showed that the recognition rate using the feature extracted based on the proposed normalization method has totally a 80.64% relative improvement.

Full Paper

Bibliographic reference.  Lu, Xugang / Unoki, Masashi / Nakamura, Satoshi (2009): "Subband temporal modulation spectrum normalization for automatic speech recognition in reverberant environments", In INTERSPEECH-2009, 2503-2506.