9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Rapid Unsupervised Speaker Adaptation Robust in Reverberant Environment Conditions

Randy Gomez (1), Jani Even (2), Kiyohiro Shikano (2)

(1) Kyoto University, Japan; (2) NAIST, Japan

We expand the conventional rapid adaptation based on N-closest speakers sufficient statistics (suff stat) to achieve robustness under reverberant conditions. We integrated our fast de-reverberation technique based on optimized multi-band spectral subtraction as pre-processing. This removes the late reflection components of the reverberant signal effectively and fast. Speakers' suff stat are then computed from the processed data and stored offline. The system only requires a single arbitrary utterance used to select the N-closest suff stat to update the model online. In this paper, we also investigate the effects in the acoustic subspace introduced by the channel (reverberation). Moreover, we compare the performance of the proposed expansion, with Speaker Adaptive Training (SAT), Constrained Maximum Likelihood Linear Regression (CMLLR) and evaluate in both artificial and actual reverberant data.

Full Paper

Bibliographic reference.  Gomez, Randy / Even, Jani / Shikano, Kiyohiro (2008): "Rapid unsupervised speaker adaptation robust in reverberant environment conditions", In INTERSPEECH-2008, 1309-1312.