INTERSPEECH 2006 - ICSLP
In this paper, regression-tree based spectral peak alignment is proposed for rapid speaker adaptation using the linearization of VTLN. Two different regression classes are investigated: phonetic classes (using combined knowledge and data-driven techniques) and mixture classes. Compared to MLLR and VTLN, improved performance can be obtained for both supervised and unsupervised adaptations on both medium vocabulary and connected digits recognition tasks. To further improve the performance, MLLR was integrated into this regression-tree based peak alignment. Experimental results show that the performance improvements can be achieved even with limited adaptation data.
Bibliographic reference. Wang, Shizhen / Cui, Xiaodong / Alwan, Abeer (2006): "Rapid speaker adaptation using regression-tree based spectral peak alignment", In INTERSPEECH-2006, paper 1334-Wed1A2O.1.