Detecting Overlapped Speech on Short Timeframes Using Deep Learning

Valentin Andrei, Horia Cucu, Corneliu Burileanu


The intent of this work is to demonstrate how deep learning techniques can be successfully used to detect overlapped speech on independent short timeframes. A secondary objective is to provide an understanding on how the duration of the signal frame influences the accuracy of the method. We trained a deep neural network with heterogeneous layers and obtained close to 80% inference accuracy on frames going as low as 25 milliseconds. The proposed system provides higher detection quality than existing work and can predict overlapped speech with up to 3 simultaneous speakers. The method exposes low response latency and does not require a high amount of computing power.


 DOI: 10.21437/Interspeech.2017-188

Cite as: Andrei, V., Cucu, H., Burileanu, C. (2017) Detecting Overlapped Speech on Short Timeframes Using Deep Learning. Proc. Interspeech 2017, 1198-1202, DOI: 10.21437/Interspeech.2017-188.


@inproceedings{Andrei2017,
  author={Valentin Andrei and Horia Cucu and Corneliu Burileanu},
  title={Detecting Overlapped Speech on Short Timeframes Using Deep Learning},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1198--1202},
  doi={10.21437/Interspeech.2017-188},
  url={http://dx.doi.org/10.21437/Interspeech.2017-188}
}