Augmented Data Training of Joint Acoustic/Phonotactic DNN i-vectors for NIST LRE15

Alan Mccree, Greg Sell, Daniel Garcia-Romero


This paper presents the JHU HLTCOE submission to the NIST 2015 Language Recognition Evaluation, including critical and novel algorithmic components, use of limited and augmented training data, and additional post-evaluation analysis and improvements. All of our systems used i-vectors based on Deep Neural Networks (DNNs) with discriminatively-trained Gaussian classifiers, and linear fusion was performed with duration-dependent scaling. A key innovation was the use of three different kinds of i-vectors: acoustic, phonotactic, and joint. In addition, data augmentation was used to overcome the limited training data of this evaluation. Post-evaluation analysis shows the benefits of these design decisions, as well as further potential improvements.


DOI: 10.21437/Odyssey.2016-29

Cite as

Mccree, A., Sell, G., Garcia-Romero, D. (2016) Augmented Data Training of Joint Acoustic/Phonotactic DNN i-vectors for NIST LRE15. Proc. Odyssey 2016, 204-209.

Bibtex
@inproceedings{Mccree+2016,
author={Alan Mccree and Greg Sell and Daniel Garcia-Romero},
title={Augmented Data Training of Joint Acoustic/Phonotactic DNN i-vectors for NIST LRE15},
year=2016,
booktitle={Odyssey 2016},
doi={10.21437/Odyssey.2016-29},
url={http://dx.doi.org/10.21437/Odyssey.2016-29},
pages={204--209}
}