An Approach to Online Speaker Change Point Detection Using DNNs and WFSTs

Lukas Mateju, Petr Cerva, Jindrich Zdansky


In this paper, a new approach to speaker change point (SCP) detection is presented. This method is suitable for online applications (e.g., real-time broadcast monitoring). It is designed in a series of consecutive experiments, aiming at quality of detection as well as low latency. The resulting scheme utilizes a convolution neural network (CNN), whose output is smoothed by a decoder. The CNN is trained using data complemented by artificial examples to reduce different types of errors, and the decoder is based on a weighted finite state transducer (WFST) with the forced length of the transition model. Results obtained on data taken from the COST278 database show that our online approach yields results comparable with an offline multi-pass LIUM toolkit while operating online with a low latency.


 DOI: 10.21437/Interspeech.2019-1407

Cite as: Mateju, L., Cerva, P., Zdansky, J. (2019) An Approach to Online Speaker Change Point Detection Using DNNs and WFSTs. Proc. Interspeech 2019, 649-653, DOI: 10.21437/Interspeech.2019-1407.


@inproceedings{Mateju2019,
  author={Lukas Mateju and Petr Cerva and Jindrich Zdansky},
  title={{An Approach to Online Speaker Change Point Detection Using DNNs and WFSTs}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={649--653},
  doi={10.21437/Interspeech.2019-1407},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1407}
}