11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Multichannel Source Separation Based on Source Location Cue with Log-Spectral Shaping by Hidden Markov Source Model

Tomohiro Nakatani, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto

NTT Corporation, Japan

This paper proposes a multichannel source separation approach that exploits statistical characteristics of source location cues represented by inter-channel phase differences (IPD) and those of source log spectra represented by hidden Markov models (HMM). With this approach, source separation is achieved by iterating two simple sub-procedures, namely the clustering of the time-frequency (TF) bins into individual sources and the independent updating of the model parameters of each source. An advantage of this approach is that we can update the model parameters of each source independently of those of the other sources in each iteration, and thus the update can be computationally very efficient. We show by simulation experiments that the proposed method can greatly improve, in a computationally efficient manner, the quality of each source signal from sound mixtures in terms of cepstral distortion using an speaker independent HMM composed of very small number of states.

Full Paper

Bibliographic reference.  Nakatani, Tomohiro / Araki, Shoko / Yoshioka, Takuya / Fujimoto, Masakiyo (2010): "Multichannel source separation based on source location cue with log-spectral shaping by hidden Markov source model", In INTERSPEECH-2010, 2766-2769.