Sub-band Envelope Features Using Frequency Domain Linear Prediction for Short Duration Language Identification

Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah


Mismatch between training and testing utterances can significantly degrade the performance of language identification (LID) systems, especially in the case of short duration utterances. This work explores the hypothesis that long-term trends are less affected by this mismatch compared to short-term features. In particular, it proposes the use of features based on temporal envelopes within sub-bands. In this work, the temporal envelopes are obtained using linear prediction in the frequency domain. These envelopes are then transformed into cepstral features. The proposed features are then used as a front-end to a bidirectional long short term memory recurrent neural network to identify languages. Experimental evaluations on the AP17-OLR dataset under different conditions indicate that the proposed features exhibit substantially greater robustness under different noise and mismatch conditions, compared to baseline features. Specifically, the proposed features outperform state-of-the-art bottleneck features and show a relative improvement of 38.4% averaged across the test set.


 DOI: 10.21437/Interspeech.2018-1805

Cite as: Fernando, S., Sethu, V., Ambikairajah, E. (2018) Sub-band Envelope Features Using Frequency Domain Linear Prediction for Short Duration Language Identification. Proc. Interspeech 2018, 1818-1822, DOI: 10.21437/Interspeech.2018-1805.


@inproceedings{Fernando2018,
  author={Sarith Fernando and Vidhyasaharan Sethu and Eliathamby Ambikairajah},
  title={Sub-band Envelope Features Using Frequency Domain Linear Prediction for Short Duration Language Identification},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1818--1822},
  doi={10.21437/Interspeech.2018-1805},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1805}
}