16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

fMLLR Based Feature-Space Speaker Adaptation of DNN Acoustic Models

Sree Hari Krishnan Parthasarathi (1), Bjorn Hoffmeister (1), Spyros Matsoukas (1), Arindam Mandal (1), Nikko Strom (1), Sri Garimella (2)

(1), USA
(2), India

We investigate the problem of speaker adaptation of DNN acoustic models in two settings: the traditional unsupervised adaptation and a supervised adaptation (SuA) where a few minutes of transcribed speech is available. SuA presents additional difficulties when a test speaker's adaptation information does not match the registered speaker's information. Employing feature-space maximum likelihood linear regression (fMLLR) transformed features as side-information to the DNN, we reintroduce some classical ideas for combining adapted and unadapted features: early and late fusion methods, as well as the estimation of the fMLLR transforms using simple target models (STM). Results show that early fusion helps DNNs generalize better when features are combined after a non-linear bottleneck layer, while late fusion improves robustness, specifically in mismatched cases. STM give consistent improvements in both settings.

Full Paper

Bibliographic reference.  Parthasarathi, Sree Hari Krishnan / Hoffmeister, Bjorn / Matsoukas, Spyros / Mandal, Arindam / Strom, Nikko / Garimella, Sri (2015): "fMLLR based feature-space speaker adaptation of DNN acoustic models", In INTERSPEECH-2015, 3630-3634.