Interspeech'2005 - Eurospeech
Eigenvoice (EV) speaker adaptation has been shown effective for fast speaker adaptation when the amount of adaptation data is scarce. In the past two years, we have been investigating the application of kernel methods to improve EV speaker adaptation by exploiting possible nonlinearity in the speaker space, and two methods were proposed: embedded kernel eigenvoice (eKEV) and kernel eigenspace-based MLLR (KEMLLR). In both methods, kernel PCA is used to derive eigenvoices in the kernel-induced high-dimensional feature space, and they differ mainly in the representation of the speaker models. Both had been shown to outperform all other common adaptation methods when the amount of adaptation data is less than 10s. However, in the past, only small vocabulary speech recognition tasks were tried since we were not familiar with the behaviour of these kernelized methods. As we gain more experience, we are now ready to tackle larger vocabularies. In this paper, we show that both methods continue to outperform MAP, and MLLR when only 5s or 10s of adaptation data are available on the WSJ0 5K-vocabulary task. Compared with the speaker-independent model, the two methods reduce recognition word error rate by 13.4%-21.1%.
Bibliographic reference. Hsiao, Roger / Mak, Brian (2005): "A comparative study of two kernel eigenspace-based speaker adaptation methods on large vocabulary continuous speech recognition", In INTERSPEECH-2005, 1797-1800.