10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Analysis of Low-Resource Acoustic Model Self-Training

Scott Novotney, Richard Schwartz

BBN Technologies, USA

Previous work on self-training of acoustic models using unlabeled data reported significant reductions in WER assuming a large phonetic dictionary was available. We now assume only those words from ten hours of speech are initially available. Subsequently, we are then given a large vocabulary and then quantify the value of repeating self-training with this larger dictionary. This experiment is used to analyze the effects of self-training on categories of words. We report the following findings: (i) Although the small 5k vocabulary raises WER by 2% absolute, self-training is equally effective as using a large 75k vocabulary. (ii) Adding all 75k words to the decoding vocabulary after self-training reduces the WER degradation to only 0.8% absolute. (iii) Self-training most benefits those words in the unlabeled audio but not transcribed by a wide margin.

Full Paper

Bibliographic reference.  Novotney, Scott / Schwartz, Richard (2009): "Analysis of low-resource acoustic model self-training", In INTERSPEECH-2009, 244-247.