5th International Conference on Spoken Language Processing
Current speech recognition systems require large amounts of expensive transcribed data for parameter estimation. In this work we describe our experiments which are aimed at training a speech recognizer without transcriptions. The experiments were carried out with untranscribed TV newscast recordings. The newscasts were automatically segmented into segments of similar acoustic background condition. We develop a training scheme, where a recognizer is bootstrapped using very little transcribed data and is improved using new, untranscribed speech. We show that it is necessary to use a confidence measure to judge the initial transcriptions of the recognizer before using them. Higher improvements can be achieved if the number of parameters in the system is increased when more data becomes available. We show, that the beneficial effect of unsupervised training is not compensated by MLLR adaptation on the hypothesis. Using the described methods, we found that the untranscribed data gives roughly one third of the improvement of the transcribed material.
Bibliographic reference. Kemp, Thomas / Waibel, Alex (1998): "Unsupervised training of a speech recognizer using TV broadcasts", In ICSLP-1998, paper 0758.