16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Dereverberation for Active Human-Robot Communication Robust to Speaker's Face Orientation

Randy Gomez, Levko Ivanchuk, Keisuke Nakamura, Takeshi Mizumoto, Kazuhiro Nakadai

Honda Research Institute Japan, Japan

Reverberation poses a problem to the active robot audition system. The change in speaker's face orientation relative to the robot perturbs the room acoustics and alters the reverberation condition at runtime, which degrades the automatic speech recognition (ASR) performance. In this paper, we present a method to mitigate this problem in the context of the ASR. First, filter coefficients are derived to correct the Room Transfer Function (RTF) per change in face orientation. We treat the change in the face orientation as a filtering mechanism that captures the room acoustics. Then, joint dynamics between the filter and the observed reverberant speech is investigated in consideration with the ASR system. Second, we introduce a gain correction scheme to compensate the change in power as a function of the face orientation. This scheme is also linked to the ASR, in which gain parameters are derived via the Viterbi algorithm. Experimental results using Hidden Markov Model-Deep Neural Network (HMM-DNN) ASR in a reverberant robot environment, show that proposed method is robust to the change in face orientation and outperforms state-of-the-art dereverberation techniques.

Full Paper

Bibliographic reference.  Gomez, Randy / Ivanchuk, Levko / Nakamura, Keisuke / Mizumoto, Takeshi / Nakadai, Kazuhiro (2015): "Dereverberation for active human-robot communication robust to speaker's face orientation", In INTERSPEECH-2015, 180-184.