An Investigation on the Use of i-Vectors for Robust ASR

Dimitrios Dimitriadis, Samuel Thomas, Sriram Ganapathy


In this paper we propose two different i-vector representations that improve the noise robustness of automatic speech recognition (ASR). The first kind of i-vectors is derived from “noise only” components of speech provided by an adaptive denoising algorithm, the second variant is extracted from mel filterbank energies containing both speech and noise. The effectiveness of both these representations is shown by combining them with two different kinds of spectral features — the commonly used log-mel filterbank energies and Teager energy spectral coefficients (TESCs). Using two different DNN architectures for acoustic modeling — a standard state-of-the-art sigmoid-based DNN and an advanced architecture using leaky ReLUs, dropout and rescaling, we demonstrate the benefit of the proposed representations. On the Aurora-4 multi-condition training task the proposed front-end improves ASR performance by 4%.


DOI: 10.21437/Interspeech.2016-1482

Cite as

Dimitriadis, D., Thomas, S., Ganapathy, S. (2016) An Investigation on the Use of i-Vectors for Robust ASR. Proc. Interspeech 2016, 3828-3832.

Bibtex
@inproceedings{Dimitriadis+2016,
author={Dimitrios Dimitriadis and Samuel Thomas and Sriram Ganapathy},
title={An Investigation on the Use of i-Vectors for Robust ASR},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1482},
url={http://dx.doi.org/10.21437/Interspeech.2016-1482},
pages={3828--3832}
}