This paper presents the JHU HLTCOE submission to the NIST 2015 Language Recognition Evaluation, including critical and novel algorithmic components, use of limited and augmented training data, and additional post-evaluation analysis and improvements. All of our systems used i-vectors based on Deep Neural Networks (DNNs) with discriminatively-trained Gaussian classifiers, and linear fusion was performed with duration-dependent scaling. A key innovation was the use of three different kinds of i-vectors: acoustic, phonotactic, and joint. In addition, data augmentation was used to overcome the limited training data of this evaluation. Post-evaluation analysis shows the benefits of these design decisions, as well as further potential improvements.
Cite as: Mccree, A., Sell, G., Garcia-Romero, D. (2016) Augmented Data Training of Joint Acoustic/Phonotactic DNN i-vectors for NIST LRE15. Proc. The Speaker and Language Recognition Workshop (Odyssey 2016), 204-209, doi: 10.21437/Odyssey.2016-29
@inproceedings{mccree16_odyssey, author={Alan Mccree and Greg Sell and Daniel Garcia-Romero}, title={{Augmented Data Training of Joint Acoustic/Phonotactic DNN i-vectors for NIST LRE15}}, year=2016, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2016)}, pages={204--209}, doi={10.21437/Odyssey.2016-29} }