9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Unsupervised versus Supervised Training of Acoustic Models

Jeff Ma, Richard Schwartz

BBN Technologies, USA

In this paper we report unsupervised training experiments we have conducted on large amounts of the English Fisher conversational telephone speech. A great amount of work has been reported on unsupervised training, but the major difference of this work is that we compared behaviors of unsupervised training with supervised training on exactly the same data. This comparison reveals surprising results. First, as the amount of training data increases, unsupervised training, even bootstrapped with a very limited amount (1 hour) of manual data, improves recognition performance faster than supervised training does, and it converges to supervised training. Second, bootstrapping unsupervised training with more manual data is not of significance if a large amount of un-transcribed data is available.

Full Paper

Bibliographic reference.  Ma, Jeff / Schwartz, Richard (2008): "Unsupervised versus supervised training of acoustic models", In INTERSPEECH-2008, 2374-2377.