16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Ensemble Speaker Modeling Using Speaker Adaptive Training Deep Neural Network for Speaker Adaptation

Sheng Li (1), Xugang Lu (2), Yuya Akita (1), Tatsuya Kawahara (1)

(1) Kyoto University, Japan
(2) NICT, Japan

In this paper, we introduce an ensemble speaker modeling using a speaker adaptive training (SAT) deep neural network (SAT-DNN). We first train a speaker-independent DNN (SI-DNN) acoustic model as a universal speaker model (USM). Based on the USM, a SAT-DNN is used to obtain a set of speaker-dependent models by assuming that all other layers except one speaker-dependent (SD) layer are shared among speakers. The speaker ensemble matrix is created by concatenating all of the SD neural weight matrices. With matrix factorization technique, an ensemble speaker subspace is extracted. When testing, an initial model for each target speaker is selected in this ensemble speaker subspace. Then, adaptation is carried out to obtain the final acoustic model for testing. In order to reduce the number of adaptation parameters, low-rank speaker subspace is further explored. We test our algorithm on lecture transcription task. Experimental results showed that our proposed method is effective for unsupervised speaker adaptation.

Full Paper

Bibliographic reference.  Li, Sheng / Lu, Xugang / Akita, Yuya / Kawahara, Tatsuya (2015): "Ensemble speaker modeling using speaker adaptive training deep neural network for speaker adaptation", In INTERSPEECH-2015, 2892-2896.