INTERSPEECH 2006 - ICSLP
This paper proposes a robust feature extraction method for automatic speech recognition (ASR) systems in reverberant environment. In this method, a sub-band power envelope inverse filtering algorithm based on the modulation transfer function (MTF), that we have previously proposed, is incorporated as a front-end processor for ASR. The impulse response of the room acoustics is assumed to be exponential decay modulated white noise, and speech is assumed to be temporal modulated white noise in each sub-band. Therefore, the impulse response of the environment does not need to be measured. Testing demonstrated that this algorithm can restore the temporal power envelope of reverberant speech in subbands and thus reduce the loss of speech intelligibility caused by reverberation. Testing of its ability to recognize digitized Japanese speech was done by using reverberant speech created by simple convolution of the room acoustics and speech. The algorithm had a 32.1% higher error reduction rate (on average, for reverberation times from 0.1 to 2.0 s) compared with the traditional cepstral mean normalization (CMN) of the auditory power spectrum based method (AFCC).
Bibliographic reference. Lu, Xugang / Unoki, Masashi / Akagi, Masato (2006): "A robust feature extraction based on the MTF concept for speech recognition in reverberant environment", In INTERSPEECH-2006, paper 1801-Thu2CaP.7.