International Workshop on Hands-Free Speech Communication (HSC2001)

April 9-11, 2001
Kyoto, Japan

Limitation of Frequency Domain Blind Source Separation for Convolutive Mixture of Speech

Shoko Araki (1), Shoji Makino (1), Tsuyoki Nishikawa (2), Hiroshi Saruwatari (2)

(1) NTT Communication Science Laboratories, Soraku-gun, Kyoto, Japan
(2) Nara Institute of Science and Technology, Japan

Despite several recent proposals to achieve Blind Source Separation (BSS) for realistic acoustic signal, separation performance is still not enough. In particular, when the length of impulse response is long, performance is highly limited. In this paper, we show it is useless to be constrained by the condition, P « T, where T is the frame size of FFT and P is the length of room impulse response. From our experiments, a frame size of 256 or 512 (32 or 64 ms at a sampling frequency of 8 kHz) is best even for the long room reverberation of TR = 150 and 300 ms. We also clarified the reason for poor performance of BSS in long reverberant environment, finding that separation is achieved chiefly for the sound from the direction of jammer because BSS cannot calculate the inverse of the room transfer function both for the target and jammer signals.


Full Paper

Bibliographic reference.  Araki, Shoko / Makino, Shoji / Nishikawa, Tsuyoki / Saruwatari, Hiroshi (2001): "Limitation of frequency domain blind source separation for convolutive mixture of speech", In HSC2001, 91-94.