ISCA Archive CHiME 2020
ISCA Archive CHiME 2020

The STC System for the CHiME-6 Challenge

Ivan Medennikov, Maxim Korenevsky, Tatiana Prisyach, Yuri Khokhlov, Mariya Korenevskaya, Ivan Sorokin, Tatiana Timofeeva, Anton Mitrofanov, Andrei Andrusenko, Ivan Podluzhny, Aleksandr Laptev, Aleksei Romanenko

This paper is a description of the Speech Technology Center (STC) systems for the CHiME-6 challenge aimed at multimicrophone multi-speaker speech recognition and diarization in a dinner party scenario. We participated in both Track 1 and Track 2 and submitted our results for Ranking A as well as Ranking B for each track.

The soft-activity based Guided Source Separation (GSS) as a front-end and a combination of advanced acoustic modeling techniques such as GSS-based training data augmentation, multi-stride and multi-stream self-attention layers, statistics layer and SpecAugment, as well as the lattice-level fusion of acoustic models were applied in the 1st track system. Our system for Track 1 was in the top three systems, achieving 30% relative WER reduction over the baseline. Additionally, lattice rescoring with a neural language model was applied for Ranking B. Overall, this led to 34% relative WER reduction over the baseline in Track 1.

For Track 2, we proposed a novel Target-Speaker Voice Activity Detection (TS-VAD) approach to solve the diarization problem. Good diarization results made it possible to perform GSS on the obtained segments. TS-VAD is based on i-vector speaker embeddings, which are initially estimated using a strong diarization system based on spectral clustering of x-vectors. The back-end from the Track 1 system was used in the second track. The system for Track 2 demonstrated state-of-the-art performance, outperforming the baseline by 39% DER, 45% JER, 43% WER (Ranking A) and 45% WER (Ranking B) relative.

doi: 10.21437/CHiME.2020-9

Cite as: Medennikov, I., Korenevsky, M., Prisyach, T., Khokhlov, Y., Korenevskaya, M., Sorokin, I., Timofeeva, T., Mitrofanov, A., Andrusenko, A., Podluzhny, I., Laptev, A., Romanenko, A. (2020) The STC System for the CHiME-6 Challenge. Proc. 6th International Workshop on Speech Processing in Everyday Environments (CHiME 2020), 36-41, doi: 10.21437/CHiME.2020-9

  author={Ivan Medennikov and Maxim Korenevsky and Tatiana Prisyach and Yuri Khokhlov and Mariya Korenevskaya and Ivan Sorokin and Tatiana Timofeeva and Anton Mitrofanov and Andrei Andrusenko and Ivan Podluzhny and Aleksandr Laptev and Aleksei Romanenko},
  title={{The STC System for the CHiME-6 Challenge}},
  booktitle={Proc. 6th International Workshop on Speech Processing in Everyday Environments (CHiME 2020)},