ISCA Workshop on
Statistical And Perceptual Audition

Makuhari, Japan
September 25, 2010

Online Speech Source Separation in Meeting Scene with Time-Varying Weights of Noise Covariance Matrices

Masahito Togami, Koichi Hori

Department of Aeronautics and Astoronautics, School of Engineering, The University of Tokyo, Japan

We propose an online speech source separation technique in a meeting situation. The purpose in this paper is online extraction of each speech source from multichannel microphone input signal which is contaminated by speech sources of the other persons (noise sources). The proposed method is one of adaptive beamformers. The proposed method estimates the noise covariance matrix of the multichannel microphone input signal as a weighting average value of a noise covariance matrix of each speech source that is estimated offline. Weighting is done by using estimated activity of each speech source. By using the proposed method, even when the noise covariance matrix of microphone input signal changes rapidly due to nodding, interruption, or turn taking, the speech sources can be separated. Experimental results indicate that the proposed method can track rapid change of the noise covariance matrix and the speech sources can be separated correctly.

Index Terms: speech source separation, online algorithm, beamforming

Full Paper

Bibliographic reference.  Togami, Masahito / Hori, Koichi (2010): "Online speech source separation in meeting scene with time-varying weights of noise covariance matrices", In SAPA-2010, 25-30.