This paper deals with unsupervised feature-based speaker adaptation techniques. The goal is to design an optimal adaptation approach for improving the recognition accuracy of a LVCSR system developed for automatic transcription of large archives of spoken Czech (e.g. the archive of the parliament talks, historical archives of Czech broadcast stations, etc.) For this purpose, several modifications of VTLN and CMLLR techniques were investigated and combined together. Our study focuses on the application of the adaptation methods in the recognition process as well as in building a normalized acoustic model within the speaker adaptive training scheme. The methods were evaluated experimentally on a large amount of various data (with total number 93k words). The resulting two-step adaptation scheme yields a significant WER reduction from 17.8% to 14.8%.
Bibliographic reference. Cerva, Petr / Palecek, Karel / Silovsky, Jan / Nouza, Jan (2011): "Using unsupervised feature-based speaker adaptation for improved transcription of spoken archives", In INTERSPEECH-2011, 2565-2568.