15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Multichannel Speech Dereverberation Based on Convolutive Nonnegative Tensor Factorization for ASR Applications

Seyedmahdad Mirsamadi, John H. L. Hansen

University of Texas at Dallas, USA

Room reverberation is a primary cause of failure in distant speech recognition (DSR) systems. In this study, we present a multichannel spectrum enhancement method for reverberant speech recognition, which is an extension of a single-channel dereverberation algorithm based on convolutive nonnegative matrix factorization (NMF). The generalization to a multichannel scenario is shown to be a special case of convolutive nonnegative tensor factorization (NTF). The presented algorithm integrates information from across different channels in the magnitude short time Fourier transform (STFT) domain. By doing so, it eliminates any limitations on the array geometry or a need for information concerning the source location, making the algorithm particularly suitable for distributed microphone arrays. Experiments are performed on speech data using actual room impulse responses from AIR database. Relative WER improvements using a clean-trained ASR system vary from +7.1% to +30.1% based on the number of channels and the source to microphone distances (1 to 3 meters)

Full Paper

Bibliographic reference.  Mirsamadi, Seyedmahdad / Hansen, John H. L. (2014): "Multichannel speech dereverberation based on convolutive nonnegative tensor factorization for ASR applications", In INTERSPEECH-2014, 2828-2832.