In this paper, we introduce an improved method for unsupervised training where the data selection or filtering process is done on state level. We describe in detail the setup of the experiments and introduce the state confidence scores on word and allophone state level for performing the data selection for mixture training on state level. Although we are using a relatively small amount of 180 hours of untranscribed recordings in addition to the available carefully manually transcribed transcriptions of 100 hours, we are able to significantly improve our final speaker adaptive acoustic model. Furthermore, we present promising results by doing system combination using the acoustic models trained on different confidence thresholds. These methods are evaluated on the EPPS corpus starting from the RWTH European English parliamentary speech transcription system. A significant improvement of 7% relative is achieved using less data for unsupervised training than conventional systems require.
Bibliographic reference. Gollan, Christian / Hahn, Stefan / Schlüter, Ralf / Ney, Hermann (2007): "An improved method for unsupervised training of LVCSR systems", In INTERSPEECH-2007, 2101-2104.