10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Unsupervised Lattice-Based Acoustic Model Adaptation for Speaker-Dependent Conversational Telephone Speech Transcription

K. Thambiratnam, F. Seide

Microsoft Research Asia, China

This paper examines the application of lattice adaptation techniques to speaker-dependent models for the purpose of conversational telephone speech transcription. Given sufficient training data per speaker, it is feasible to build adapted speaker-dependent models using lattice MLLR and lattice MAP. Experiments on iterative and cascaded adaptation are presented. Additionally various strategies for thresholding frame posteriors are investigated, and it is shown that accumulating statistics from the local bestconfidence path is sufficient to achieve optimal adaptation. Overall, an iterative cascaded lattice system was able to reduce WER by 7.0% abs., which was a 0.8% abs. gain over transcript-based adaptation. Lattice adaptation reduced the unsupervised/supervised adaptation gap from 2.5% to 1.7%.

Full Paper

Bibliographic reference.  Thambiratnam, K. / Seide, F. (2009): "Unsupervised lattice-based acoustic model adaptation for speaker-dependent conversational telephone speech transcription", In INTERSPEECH-2009, 1611-1614.