ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Analysis of low-resource acoustic model self-training

Scott Novotney, Richard Schwartz

Previous work on self-training of acoustic models using unlabeled data reported significant reductions in WER assuming a large phonetic dictionary was available. We now assume only those words from ten hours of speech are initially available. Subsequently, we are then given a large vocabulary and then quantify the value of repeating self-training with this larger dictionary. This experiment is used to analyze the effects of self-training on categories of words. We report the following findings: (i) Although the small 5k vocabulary raises WER by 2% absolute, self-training is equally effective as using a large 75k vocabulary. (ii) Adding all 75k words to the decoding vocabulary after self-training reduces the WER degradation to only 0.8% absolute. (iii) Self-training most benefits those words in the unlabeled audio but not transcribed by a wide margin.

doi: 10.21437/Interspeech.2009-86

Cite as: Novotney, S., Schwartz, R. (2009) Analysis of low-resource acoustic model self-training. Proc. Interspeech 2009, 244-247, doi: 10.21437/Interspeech.2009-86

  author={Scott Novotney and Richard Schwartz},
  title={{Analysis of low-resource acoustic model self-training}},
  booktitle={Proc. Interspeech 2009},