The STC ASR System for the VOiCES from a Distance Challenge 2019

Ivan Medennikov, Yuri Khokhlov, Aleksei Romanenko, Ivan Sorokin, Anton Mitrofanov, Vladimir Bataev, Andrei Andrusenko, Tatiana Prisyach, Mariya Korenevskaya, Oleg Petrov, Alexander Zatvornitskiy

This paper is a description of the Speech Technology Center (STC) automatic speech recognition (ASR) system for the “VOiCES from a Distance Challenge 2019”. We participated in the Fixed condition of the ASR task, which means that the only training data available was an 80-hour subset of the LibriSpeech corpus. The main difficulty of the challenge is a mismatch between clean training data and distant noisy development/ evaluation data. In order to tackle this, we applied room acoustics simulation and weighted prediction error (WPE) dereverberation. We also utilized well-known speaker adaptation using x-vector speaker embeddings, as well as novel room acoustics adaptation with R-vector room impulse response (RIR) embeddings. The system used a lattice-level combination of 6 acoustic models based on different pronunciation dictionaries and input features. N-best hypotheses were rescored with 3 neural network language models (NNLMs) trained on both words and sub-word units. NNLMs were also explored for out-of-vocabulary (OOV) words handling by means of artificial texts generation. The final system achieved Word Error Rate (WER) of 14.7% on the evaluation data, which is the best result in the challenge.

This paper also appears in session Wed-SS-7-3.

Cite as: Medennikov, I., Khokhlov, Y., Romanenko, A., Sorokin, I., Mitrofanov, A., Bataev, V., Andrusenko, A., Prisyach, T., Korenevskaya, M., Petrov, O., Zatvornitskiy, A. (2019) The STC ASR System for the VOiCES from a Distance Challenge 2019. Proc. Interspeech 2019.

  author={Ivan Medennikov and Yuri Khokhlov and Aleksei Romanenko and Ivan Sorokin and Anton Mitrofanov and Vladimir Bataev and Andrei Andrusenko and Tatiana Prisyach and Mariya Korenevskaya and Oleg Petrov and Alexander Zatvornitskiy},
  title={{The STC ASR System for the VOiCES from a Distance Challenge 2019}},
  booktitle={Proc. Interspeech 2019}