In this work we present intermediate-layer deep neural network adaptation (DNN) techniques upon which we build offline as well as iterative speaker adaptation for online applications. We motivate our online work for task completion in Microsoft personal voice assistant, where we present different adaptation styles in a speech session e.g., (a) adapt the speaker-independent (SI) model on the current utterance, (b) recursively adapt an incremental speaker-dependent (SD) model in the session for just the previous utterance, (c) adapt the SI model for all past utterances in the session. We considered a number of adaptation techniques and demonstrated that the intermediate-layer approach with inserting-and-adapting a linear layer on top of an intermediate singular-value-decomposition layer provides the best results for offline adaptation, where we obtained respectively 22.6% and 12% relative reduction in word-error-rate (WER) for supervised and unsupervised adaptation on 100-utterances. An alternative intermediate-layer recursive adaptation in a 5-utterances session provided 6% relative-reduction in WER for online applications.
Bibliographic reference. Kumar, Kshitiz / Liu, Chaojun / Yao, Kaisheng / Gong, Yifan (2015): "Intermediate-layer DNN adaptation for offline and session-based iterative speaker adaptation", In INTERSPEECH-2015, 1091-1095.