12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Instantaneous Speaker Adaptation Through Selection and Combination of fMLLR Transformation Matrices

Diego Giuliani, Fabio Brugnara

FBK-irst, Italy

This paper addresses instantaneous speaker adaptation, based on feature-space maximum likelihood linear regression (fMLLR), in the context of an automatic transcription task. We investigate the use of fMLLR-based adaptation when the need of a preliminary decoding pass for a speech segment is removed, as sufficient statistics for adaptation parameter estimation are gathered with respect to a Gaussian mixture model. To cope with limited adaptation data, in addition of using feature-space maximum a posteriori linear regression (fMAPLR), an investigation is conducted where the transformation matrix to be applied to the speech segment is estimated through selection and combination of pre-computed fMLLR transformation matrices. For speaker adaptively trained acoustic models results of recognition experiments show that the proposed approach is moderately better than fMLLR but not as good as fMAPLR.

