Previous work on self-training of acoustic models using unlabeled data reported significant reductions in WER assuming a large phonetic dictionary was available. We now assume only those words from ten hours of speech are initially available. Subsequently, we are then given a large vocabulary and then quantify the value of repeating self-training with this larger dictionary. This experiment is used to analyze the effects of self-training on categories of words. We report the following findings: (i) Although the small 5k vocabulary raises WER by 2% absolute, self-training is equally effective as using a large 75k vocabulary. (ii) Adding all 75k words to the decoding vocabulary after self-training reduces the WER degradation to only 0.8% absolute. (iii) Self-training most benefits those words in the unlabeled audio but not transcribed by a wide margin.
Cite as: Novotney, S., Schwartz, R. (2009) Analysis of low-resource acoustic model self-training. Proc. Interspeech 2009, 244-247, doi: 10.21437/Interspeech.2009-86
@inproceedings{novotney09_interspeech, author={Scott Novotney and Richard Schwartz}, title={{Analysis of low-resource acoustic model self-training}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={244--247}, doi={10.21437/Interspeech.2009-86} }