Previous work on self-training of acoustic models using unlabeled data reported significant reductions in WER assuming a large phonetic dictionary was available. We now assume only those words from ten hours of speech are initially available. Subsequently, we are then given a large vocabulary and then quantify the value of repeating self-training with this larger dictionary. This experiment is used to analyze the effects of self-training on categories of words. We report the following findings: (i) Although the small 5k vocabulary raises WER by 2% absolute, self-training is equally effective as using a large 75k vocabulary. (ii) Adding all 75k words to the decoding vocabulary after self-training reduces the WER degradation to only 0.8% absolute. (iii) Self-training most benefits those words in the unlabeled audio but not transcribed by a wide margin.
Bibliographic reference. Novotney, Scott / Schwartz, Richard (2009): "Analysis of low-resource acoustic model self-training", In INTERSPEECH-2009, 244-247.