Deep Clustering-Based Beamforming for Separation with Unknown Number of Sources

Takuya Higuchi, Keisuke Kinoshita, Marc Delcroix, Kateřina Žmolíková, Tomohiro Nakatani


This paper extends a deep clustering algorithm for use with time-frequency masking-based beamforming and perform separation with an unknown number of sources. Deep clustering is a recently proposed single-channel source separation algorithm, which projects inputs into the embedding space and performs clustering in the embedding domain. In deep clustering, bi-directional long short-term memory (BLSTM) recurrent neural networks are trained to make embedding vectors orthogonal for different speakers and concurrent for the same speaker. Then, by clustering the embedding vectors at test time, we can estimate time-frequency masks for separation. In this paper, we extend the deep clustering algorithm to a multiple microphone setup and incorporate deep clustering-based time-frequency mask estimation into masking-based beamforming, which has been shown to be more effective than masking for automatic speech recognition. Moreover, we perform source counting by computing the rank of the covariance matrix of the embedding vectors. With our proposed approach, we can perform masking-based beamforming in a multiple-speaker case without knowing the number of speakers. Experimental results show that our proposed deep clustering-based beamformer achieves comparable source separation performance to that obtained with a complex Gaussian mixture model-based beamformer, which requires the number of sources in advance for mask estimation.


 DOI: 10.21437/Interspeech.2017-721

Cite as: Higuchi, T., Kinoshita, K., Delcroix, M., Žmolíková, K., Nakatani, T. (2017) Deep Clustering-Based Beamforming for Separation with Unknown Number of Sources. Proc. Interspeech 2017, 1183-1187, DOI: 10.21437/Interspeech.2017-721.


@inproceedings{Higuchi2017,
  author={Takuya Higuchi and Keisuke Kinoshita and Marc Delcroix and Kateřina Žmolíková and Tomohiro Nakatani},
  title={Deep Clustering-Based Beamforming for Separation with Unknown Number of Sources},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1183--1187},
  doi={10.21437/Interspeech.2017-721},
  url={http://dx.doi.org/10.21437/Interspeech.2017-721}
}