This paper proposes to introduce a new model called the multi-channel factorial hidden Markov Model (MFHMM) for under-determined blind signal separation (BSS). For monaural source separation, one successful approach involves applying non-negative matrix factorization (NMF) to the magnitude spectrogram of a mixture signal, interpreted as a non-negative matrix. Up to now, multichannel extensions of NMF, which allow for the use of spatial information as an additional clue for source separation, have been proposed by several authors and proven to be an effective approach for underdetermined BSS. This approach is based on the assumption that an observed signal is a mixture of a limited number of source signals each of which has a static power spectral density scaled by a time-varying amplitude. However, many source signals in real world are non-stationary in nature and the variations of the spectral densities are much richer in time. Moreover, many sources including speech tend to stay inactive for some while until they switch to an active mode, implying that the total power of a source may depend on its underlying state. To reasonably characterize such a non-stationary nature of source signals, this paper proposes to extend the multichannel NMF model by modeling the transition of the set consisting of the spectral densities and the total power of each source using a hidden Markov model (HMM). By letting each HMM contain states corresponding to active and inactive modes, we will show that voice activity detection and source separation can be solved simultaneously through parameter inference of the present model. The experiment showed that the proposed algorithm provided a 7.65 dB improvement compared with the conventional multichannel NMF in terms of the signal-to-distortion ratio.
Bibliographic reference. Higuchi, Takuya / Takeda, Hirofumi / Nakamura, Tomohiko / Kameoka, Hirokazu (2014): "A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models", In INTERSPEECH-2014, 850-854.