Who Said That? a Comparative Study of Non-negative Matrix Factorization Techniques

Teun Krikke, Frank Broz, David Lane


In noisy environments it is difficult for a computer to understand what a person is saying especially when there are multiple speakers. In this paper we concentrate on separating overlapping speech. Non-negative matrix factorisation (NMF) is a method of doing source separation without needing a lot of data. The choice of cost function can have a significant impact on the performance of NMF. We evaluate NMF using three different cost functions (Euclidean, Itakura-Saito and Kullback-Leibler) including modifications using sparsity, convolution or additional information in the form of the direction of arrival. We conduct this evaluation on three different speech corpora. Adding directional information to NMF in the form of non-negative tensor factorisation (NTF) gives us the best result on the map task and vocalization corpora and the Itakura-Saito cost function performs best on the acoustic-camera corpus. In this paper, we show that the Itakura-Saito cost function is the most robust cost function when the recording contains noise. We do this by applying acoustic evaluation measurements.


 DOI: 10.21437/Interspeech.2018-1807

Cite as: Krikke, T., Broz, F., Lane, D. (2018) Who Said That? a Comparative Study of Non-negative Matrix Factorization Techniques. Proc. Interspeech 2018, 1234-1238, DOI: 10.21437/Interspeech.2018-1807.


@inproceedings{Krikke2018,
  author={Teun Krikke and Frank Broz and David Lane},
  title={Who Said That? a Comparative Study of Non-negative Matrix Factorization Techniques},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1234--1238},
  doi={10.21437/Interspeech.2018-1807},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1807}
}