The intent of this work is to demonstrate how deep learning techniques can be successfully used to detect overlapped speech on independent short timeframes. A secondary objective is to provide an understanding on how the duration of the signal frame influences the accuracy of the method. We trained a deep neural network with heterogeneous layers and obtained close to 80% inference accuracy on frames going as low as 25 milliseconds. The proposed system provides higher detection quality than existing work and can predict overlapped speech with up to 3 simultaneous speakers. The method exposes low response latency and does not require a high amount of computing power.
Cite as: Andrei, V., Cucu, H., Burileanu, C. (2017) Detecting Overlapped Speech on Short Timeframes Using Deep Learning. Proc. Interspeech 2017, 1198-1202, doi: 10.21437/Interspeech.2017-188
@inproceedings{andrei17_interspeech, author={Valentin Andrei and Horia Cucu and Corneliu Burileanu}, title={{Detecting Overlapped Speech on Short Timeframes Using Deep Learning}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={1198--1202}, doi={10.21437/Interspeech.2017-188} }