15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Robust Speech Recognition Using Temporal Masking and Thresholding Algorithm

Chanwoo Kim (1), Kean K. Chin (1), Michiel Bacchiani (1), Richard M. Stern (2)

(1) Google, USA
(2) Carnegie Mellon University, USA

In this paper, we present a new dereverberation algorithm called Temporal Masking and Thresholding (TMT) to enhance the temporal spectra of spectral features for robust speech recognition in reverberant environments. This algorithm is motivated by the precedence effect and temporal masking of human auditory perception. This work is an improvement of our previous dereverberation work called Suppression of Slowly-varying components and the falling edge of the power envelope (SSF). The TMT algorithm uses a different mathematical model to characterize temporal masking and thresholding compared to the model that had been used to characterize the SSF algorithm. Specifically, the nonlinear highpass filtering used in the SSF algorithm has been replaced by a masking mechanism based on a combination of peak detection and dynamic thresholding. Speech recognition results show that the TMT algorithm provides superior recognition accuracy compared to other algorithms such as LTLSS, VTS, or SSF in reverberant environments.

Full Paper

Bibliographic reference.  Kim, Chanwoo / Chin, Kean K. / Bacchiani, Michiel / Stern, Richard M. (2014): "Robust speech recognition using temporal masking and thresholding algorithm", In INTERSPEECH-2014, 2734-2738.