This paper proposes two methods for robust automatic speech recognition (ASR) in reverberant environments. Unlike other methods which mostly apply inverse filtering by blindly estimated room impulse responses to achieve dereverberation, the proposed methods are based on the utilization of the characteristics of speech. The first method - Harmonicity based Feature Analysis - takes advantage of the harmonic components of speech, which are assumed to be undistorted. The second method . Temporal Power Envelope Feature Analysis - utilizes the temporal modulation structure of speech, representing the phoneme level temporal events which contain most intelligibility information. Both methods increase the recognition performance remarkably in a different way. Combining both of them connects their individual advantages. In order to examine the performance of utilizing harmonicity and modulation temporal structure for reverberant ASR, the methods are tested in clean and reverberant training. As results show, even in strong reverberant conditions both methods obtain practical applicable performance for reverberant training. In addition, besides testing their performance in dependency on the reverberation time, their performance considering the speaker-to-microphone distance is tested, which is another new contribution in this paper.
Bibliographic reference. Petrick, Rico / Lu, Xugang / Unoki, Masashi / Akagi, Masato / Hoffmann, Rüdiger (2008): "Robust front end processing for speech recognition in reverberant environments: utilization of speech characteristics", In INTERSPEECH-2008, 658-661.