Pair-Wise Distance Metric Learning of Neural Network Model for Spoken Language Identification

Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai


The i-vector representation and modeling technique has been successfully applied in spoken language identification (SLI). In modeling, a discriminative transform or classifier must be applied to emphasize variations correlated to language identity since the i-vector representation encodes most of the acoustic variations (e.g., speaker variation, transmission channel variation, etc.). Due to the strong nonlinear discriminative power of neural network (NN) modeling (including its deep form DNN), the NN has been directly used to learn the mapping function between the i-vector representation and language identity labels. In most studies, only the point-wise feature-label information is feeded to NN for parameter learning which may result in model overfitting, particularly when with limited training data. In this study, we propose to integrate pair-wise distance metric learning in NN parameter optimization. In the representation space of nonlinear transforms of hidden layers, a distance metric learning is explicitly designed for minimizing the pair-wise intra-class variation and maximizing the inter-class variation. With the distance metric as a constraint in the point-wise learning, the i-vectors are transformed to a new feature space which are much more discriminative for samples belonging to different languages while are much more similar for samples belonging to the same language. We tested the algorithm on a SLI task, encouraging results were obtained with more than 20% relative improvement on identification error rate.


DOI: 10.21437/Interspeech.2016-722

Cite as

Lu, X., Shen, P., Tsao, Y., Kawai, H. (2016) Pair-Wise Distance Metric Learning of Neural Network Model for Spoken Language Identification. Proc. Interspeech 2016, 3216-3220.

Bibtex
@inproceedings{Lu+2016,
author={Xugang Lu and Peng Shen and Yu Tsao and Hisashi Kawai},
title={Pair-Wise Distance Metric Learning of Neural Network Model for Spoken Language Identification},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-722},
url={http://dx.doi.org/10.21437/Interspeech.2016-722},
pages={3216--3220}
}