This article presents a low-latency speaker diarization system (“who is speaking now?”) based on a hybrid approach that combines a traditional offline speaker diarization system (“who spoke when?”) with an online speaker identification system. The system fulfills all requirements of the diarization task, i.e. it does not need any a-priori information about the input, including no specific speaker models. After an initialization phase the approach allows a low-latency decision on the current speaker with an accuracy that is close to the underlying offline diarization system. The article describes the approach, evaluates the robustness of the system, and analyzes the latency/accuracy trade-off.
Bibliographic reference. Vaquero, Carlos / Vinyals, Oriol / Friedland, Gerald (2010): "A hybrid approach to online speaker diarization", In INTERSPEECH-2010, 2638-2641.