Sixth European Conference on Speech Communication and Technology
Current speech recognition systems require large amountsof transcribed data for parameter estimation. The transcription, however, is tedious and expensive. In this workwe describe our experiments which are aimed at traininga speech recognizer with only a minimal amount (30 minutes) of transcriptions and a large portion (50 hours) of untranscribed data. A recognizer is bootstrapped on the tran-scribed part of the data and initial transcripts are generatedwith it for the remainder (the untranscribed part). Usinga lattice-based confidence measure, the recognition errorsare (partially) detected and the remainder of the hypotheses is used for training. Using this scheme, the word errorrate on a broadcast news speech recognition task droppedfrom more than 32.0% to 21.4%. In a cheating experimentwe show, that this performance cannot be significantly improved by improving the measure of confidence. By com-bining the unsupervisedly trained system with our currentlybest recognizer which is trained on 15.5 hours of transcribeddata, an additional error reduction of 5% relative (as compared to the system trained in a standard fashion) is possible.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Kemp, Thomas / Waibel, Alex (1999): "Unsupervised training of a speech recognizer: recent experiments", In EUROSPEECH'99, 2725-2728.