Investigation of Senone-based Long-Short Term Memory RNNs for Spoken Language Recognition

Yao Tian, Liang He, Yi Liu, Jia Liu


Recently, the integration of deep neural networks (DNNs) trained to predict senone posteriors with conventional language modeling methods has been proved effective for spoken language recognition. This work extends some of the senone-based DNN frameworks by replacing the DNN with the LSTM RNN. Two of these approaches use the LSTM RNN to generate features. The features are extracted from the recurrent projection layer in the LSTM RNN either as frame-level acoustic features or utterance-level features and are then processed in different ways to produce scores for each target language. In the third approach, the conventional i-vector model is modified to use the LSTM RNN to produce frame alignments for sufficient statistics extraction. Experiments on the NIST LRE 2015 demonstrate the effectiveness of the proposed methods.


DOI: 10.21437/Odyssey.2016-13

Cite as

Tian, Y., He, L., Liu, Y., Liu, J. (2016) Investigation of Senone-based Long-Short Term Memory RNNs for Spoken Language Recognition. Proc. Odyssey 2016, 89-93.

Bibtex
@inproceedings{Tian+2016,
author={Yao Tian and Liang He and Yi Liu and Jia Liu},
title={Investigation of Senone-based Long-Short Term Memory RNNs for Spoken Language Recognition},
year=2016,
booktitle={Odyssey 2016},
doi={10.21437/Odyssey.2016-13},
url={http://dx.doi.org/10.21437/Odyssey.2016-13},
pages={89--93}
}