Speaker Change Detection in Broadcast TV Using Bidirectional Long Short-Term Memory Networks

Ruiqing Yin, Hervé Bredin, Claude Barras


Speaker change detection is an important step in a speaker diarization system. It aims at finding speaker change points in the audio stream. In this paper, it is treated as a sequence labeling task and addressed by Bidirectional long short term memory networks (Bi-LSTM). The system is trained and evaluated on the Broadcast TV subset from ETAPE database. The result shows that the proposed model brings good improvement over conventional methods based on BIC and Gaussian Divergence. For instance, in comparison to Gaussian divergence, it produces speech turns that are 19.5% longer on average, with the same level of purity.


 DOI: 10.21437/Interspeech.2017-65

Cite as: Yin, R., Bredin, H., Barras, C. (2017) Speaker Change Detection in Broadcast TV Using Bidirectional Long Short-Term Memory Networks. Proc. Interspeech 2017, 3827-3831, DOI: 10.21437/Interspeech.2017-65.


@inproceedings{Yin2017,
  author={Ruiqing Yin and Hervé Bredin and Claude Barras},
  title={Speaker Change Detection in Broadcast TV Using Bidirectional Long Short-Term Memory Networks},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3827--3831},
  doi={10.21437/Interspeech.2017-65},
  url={http://dx.doi.org/10.21437/Interspeech.2017-65}
}