ISCA Archive Odyssey 2018
ISCA Archive Odyssey 2018

Spoken Language Recognition using X-vectors

David Snyder, Daniel Garcia-Romero, Alan McCree, Gregory Sell, Daniel Povey, Sanjeev Khudanpur

In this paper, we apply x-vectors to the task of spoken language recognition. This framework consists of a deep neural network that maps sequences of speech features to fixed-dimensional embeddings, called x-vectors. Long-term language characteristics are captured in the network by a temporal pooling layer that aggregates information across time. Once extracted, x-vectors utilize the same classification technology developed for i-vectors. In the 2017 NIST language recognition evaluation, x-vectors achieved excellent results and outperformed our state-of-the-art i-vector systems. In the post-evaluation analysis presented here, we experiment with several variations of the x-vector framework, and find that the best performing system uses multilingual bottleneck features, data augmentation, and a discriminative Gaussian classifier.


doi: 10.21437/Odyssey.2018-15

Cite as: Snyder, D., Garcia-Romero, D., McCree, A., Sell, G., Povey, D., Khudanpur, S. (2018) Spoken Language Recognition using X-vectors. Proc. The Speaker and Language Recognition Workshop (Odyssey 2018), 105-111, doi: 10.21437/Odyssey.2018-15

@inproceedings{snyder18_odyssey,
  author={David Snyder and Daniel Garcia-Romero and Alan McCree and Gregory Sell and Daniel Povey and Sanjeev Khudanpur},
  title={{Spoken Language Recognition using X-vectors}},
  year=2018,
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2018)},
  pages={105--111},
  doi={10.21437/Odyssey.2018-15}
}