8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Rapid Unsupervised Speaker Adaptation Using Single Utterance Based on MLLR and Speaker Selection

Randy Gomez, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

NAIST, Japan

In this paper, we employ the concept of HMM-Sufficient Statistics (HMM-Suff Stat) and N-best speakers selection to realize a rapid implementation of Baum-Welch and MLLR. Only a single arbitrary utterance is required which is used to select the N-best speakers HMM-Suff Stat from the training database as adaptation data. Since HMM-Suff Stat are pre-computed offline, computation load is minimized. Moreover, adaptation data from the target speaker is not needed. An absolute improvement of 1.8% WA is achieved when using the rapid Baum-Welch as opposed to using SI model and an improvement of 1.1% WA is achieved when the rapid MLLR is used compared to rapid Baum-Welch adaptation using HMM-Suff Stat. Adaptation time is as fast as 6 sec and 7 sec respectively. Evaluation is done in noisy environment conditions where the adaptation algorithm is integrated in a speech dialogue system. Additional experiments with VTLN, MAP, and the conventional MLLR are performed.

Full Paper

Bibliographic reference.  Gomez, Randy / Toda, Tomoki / Saruwatari, Hiroshi / Shikano, Kiyohiro (2007): "Rapid unsupervised speaker adaptation using single utterance based on MLLR and speaker selection", In INTERSPEECH-2007, 262-265.