Selection of Multi-Genre Broadcast Data for the Training of Automatic Speech Recognition Systems

P. Lanchantin, Mark J.F. Gales, Penny Karanasou, X. Liu, Y. Qian, L. Wang, P.C. Woodland, C. Zhang


This paper compares schemes for the selection of multi-genre broadcast data and corresponding transcriptions for speech recognition model training. Selections of the same amount of data (700 hours) from lightly supervised alignments based on the same original subtitle transcripts are compared. Data segments were selected according to a maximum phone matched error rate between the lightly supervised decoding and the original transcript. The data selected with an improved lightly supervised system yields lower word error rates (WERs). Detailed comparisons of the data selected on carefully transcribed development data show how the selected portions match the true phone error rate for each genre. From a broader perspective, it is shown that for different genres, either the original subtitles or the lightly supervised output should be used for model training and a suitable combination yields further reductions in final WER.


DOI: 10.21437/Interspeech.2016-462

Cite as

Lanchantin, P., Gales, M.J., Karanasou, P., Liu, X., Qian, Y., Wang, L., Woodland, P., Zhang, C. (2016) Selection of Multi-Genre Broadcast Data for the Training of Automatic Speech Recognition Systems. Proc. Interspeech 2016, 3057-3061.

Bibtex
@inproceedings{Lanchantin+2016,
author={P. Lanchantin and Mark J.F. Gales and Penny Karanasou and X. Liu and Y. Qian and L. Wang and P.C. Woodland and C. Zhang},
title={Selection of Multi-Genre Broadcast Data for the Training of Automatic Speech Recognition Systems},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-462},
url={http://dx.doi.org/10.21437/Interspeech.2016-462},
pages={3057--3061}
}