This paper describes a lattice-based risk minimization training method for unsupervised language model (LM) adaptation. In a broadcast archiving system, unsupervised LM adaptation using transcriptions generated by speech recognition is considered to be useful for improving the performance. However, conventional linear interpolation methods occasionally degrade the performance because of incorrect words in the training transcriptions. Accordingly, we propose a new adaptation method aiming to reflect error information among training lattices. The method minimizes the whole risk of training lattices to yield a log-linear model, which consists of a set of linguistic features. The advantage of the method is that the model parameters can be obtained efficiently in an unsupervised manner. Experimental results obtained in transcribing Japanese broadcast news showed significant word error rate reduction for those of conventional mixture LMs.
Bibliographic reference. Kobayashi, Akio / Oku, Takahiro / Homma, Shinichi / Imai, Toru / Nakagawa, Seiichi (2011): "Lattice-based risk minimization training for unsupervised language model adaptation", In INTERSPEECH-2011, 1453-1456.