This paper examines the application of lattice adaptation techniques to speaker-dependent models for the purpose of conversational telephone speech transcription. Given sufficient training data per speaker, it is feasible to build adapted speaker-dependent models using lattice MLLR and lattice MAP. Experiments on iterative and cascaded adaptation are presented. Additionally various strategies for thresholding frame posteriors are investigated, and it is shown that accumulating statistics from the local bestconfidence path is sufficient to achieve optimal adaptation. Overall, an iterative cascaded lattice system was able to reduce WER by 7.0% abs., which was a 0.8% abs. gain over transcript-based adaptation. Lattice adaptation reduced the unsupervised/supervised adaptation gap from 2.5% to 1.7%.
Bibliographic reference. Thambiratnam, K. / Seide, F. (2009): "Unsupervised lattice-based acoustic model adaptation for speaker-dependent conversational telephone speech transcription", In INTERSPEECH-2009, 1611-1614.