Improving English Conversational Telephone Speech Recognition

Ivan Medennikov, Alexey Prudnikov, Alexander Zatvornitskiy


The goal of this work is to build a state-of-the-art English conversational telephone speech recognition system. We investigated several techniques to improve acoustic modeling, namely speaker-dependent bottleneck features, deep Bidirectional Long Short-Term Memory (BLSTM) recurrent neural networks, data augmentation and score fusion of DNN and BLSTM models. Training set consisted of the 300 hour Switchboard English speech corpus. We also examined the hypothesis rescoring using language models based on recurrent neural networks. The resulting system achieves a word error rate of 7.8% on the Switchboard part of the HUB5 2000 evaluation set which is the competitive result.


DOI: 10.21437/Interspeech.2016-473

Cite as

Medennikov, I., Prudnikov, A., Zatvornitskiy, A. (2016) Improving English Conversational Telephone Speech Recognition. Proc. Interspeech 2016, 2-6.

Bibtex
@inproceedings{Medennikov+2016,
author={Ivan Medennikov and Alexey Prudnikov and Alexander Zatvornitskiy},
title={Improving English Conversational Telephone Speech Recognition},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-473},
url={http://dx.doi.org/10.21437/Interspeech.2016-473},
pages={2--6}
}