Web Data Selection Based on Word Embedding for Low-Resource Speech Recognition

Chuandong Xie, Wu Guo, Guoping Hu, Junhua Liu


The lack of transcription files will lead to a high out-of-vocabulary (OOV) rate and a weak language model in low-resource speech recognition systems. This paper presents a web data selection method to augment these systems. After mapping all the vocabularies or short sentences to vectors in a low-dimensional space through a word embedding technique, the similarities between the web data and the small pool of training transcriptions are calculated. Then, the web data with high similarity are selected to expand the pronunciation lexicon or language model. Experiments are conducted on the NIST Open KWS15 Swahili VLLP recognition task. Compared with the baseline system, our methods can achieve a 5.23% absolute reduction in word error rate (WER) using the expanded pronunciation lexicon and a 9.54% absolute WER reduction using both the expanded lexicon and language model.


DOI: 10.21437/Interspeech.2016-45

Cite as

Xie, C., Guo, W., Hu, G., Liu, J. (2016) Web Data Selection Based on Word Embedding for Low-Resource Speech Recognition. Proc. Interspeech 2016, 1340-1344.

Bibtex
@inproceedings{Xie+2016,
author={Chuandong Xie and Wu Guo and Guoping Hu and Junhua Liu},
title={Web Data Selection Based on Word Embedding for Low-Resource Speech Recognition},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-45},
url={http://dx.doi.org/10.21437/Interspeech.2016-45},
pages={1340--1344}
}