The performance of unsupervised discriminative training has been found to be highly dependent on the accuracy of the initial automatic transcription. This paper examines a strategy where a relatively small amount of poorly recognised data are manually transcribed to supplement the automatically transcribed data. Experiments were carried out on a Mandarin broadcast transcription task using both Broadcast News (BN) and Broadcast Conversation (BC) data. A range of experimental conditions are compared for both maximum likelihood and discriminative training using directed manual transcription. For BC data, using fully unsupervised discriminative training, only 17% of the reduction in character error rate (CER) from supervised training is obtained. By automatically selecting 18% of the data for manual transcription yields 50% of the CER gain from supervised training. The directed approach to selecting data outperforms the use of a random set of data for manual transcription.
Bibliographic reference. Yu, K. / Gales, M. J. F. / Woodland, P. C. (2007): "Unsupervised training with directed manual transcription for recognising Mandarin broadcast audio", In INTERSPEECH-2007, 1709-1712.