In this paper, we employ the concept of HMM-Sufficient Statistics (HMM-Suff Stat) and N-best speakers selection to realize a rapid implementation of Baum-Welch and MLLR. Only a single arbitrary utterance is required which is used to select the N-best speakers HMM-Suff Stat from the training database as adaptation data. Since HMM-Suff Stat are pre-computed offline, computation load is minimized. Moreover, adaptation data from the target speaker is not needed. An absolute improvement of 1.8% WA is achieved when using the rapid Baum-Welch as opposed to using SI model and an improvement of 1.1% WA is achieved when the rapid MLLR is used compared to rapid Baum-Welch adaptation using HMM-Suff Stat. Adaptation time is as fast as 6 sec and 7 sec respectively. Evaluation is done in noisy environment conditions where the adaptation algorithm is integrated in a speech dialogue system. Additional experiments with VTLN, MAP, and the conventional MLLR are performed.
Bibliographic reference. Gomez, Randy / Toda, Tomoki / Saruwatari, Hiroshi / Shikano, Kiyohiro (2007): "Rapid unsupervised speaker adaptation using single utterance based on MLLR and speaker selection", In INTERSPEECH-2007, 262-265.