EUROSPEECH 2003 - INTERSPEECH 2003
In this paper, we propose a new generation of regression classes for MLLR speaker adaptation method using the PDTSSS algorithm so as to represent the characteristic of speaker effectively. This method extends the state splitting through clustering the context components of adaptation data into a tree structure. It enables to autonomously control a number of adaptation parameters (mean, variance) depending on the context information and the amount of adaptation utterances from a new speaker. Through the experiments, the phone and word recognition rates with adaptation have an average 34~37%, 9% higher accuracy than the speaker-independent acoustic models, respectively. The experimental results of Korean phone and word recognition confirmed the significant performance increase in small adaptation utterances compared with without any speaker adaptation.
Bibliographic reference. Oh, Se-Jin / Kim, Kwang-Dong / Roh, Duk-Gyoo / Sung, Woo-Chang / Chung, Hyun-Yeol (2003): "Speaker adaptation using regression classes generated by phonetic decision tree-based successive state splitting", In EUROSPEECH-2003, 1457-1460.