Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network

Weipeng He, Petr Motlicek, Jean-Marc Odobez


We propose a novel multi-task neural network-based approach for joint sound source localization and speech/non-speech classification in noisy environments. The network takes raw short time Fourier transform as input and outputs the likelihood values for the two tasks, which are used for the simultaneous detection, localization and classification of an unknown number of overlapping sound sources, Tested with real recorded data, our method achieves significantly better performance in terms of speech/non-speech classification and localization of speech sources, compared to method that performs localization and classification separately. In addition, we demonstrate that incorporating the temporal context can further improve the performance.


 DOI: 10.21437/Interspeech.2018-1269

Cite as: He, W., Motlicek, P., Odobez, J. (2018) Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network. Proc. Interspeech 2018, 312-316, DOI: 10.21437/Interspeech.2018-1269.


@inproceedings{He2018,
  author={Weipeng He and Petr Motlicek and Jean-Marc Odobez},
  title={Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={312--316},
  doi={10.21437/Interspeech.2018-1269},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1269}
}