INTERSPEECH 2015

We propose a regularized sequencelevel (SEQ) deep neural network (DNN)
model adaptation methodology as an extension of the previous KLdivergence
regularized crossentropy (CE) adaptation [1]. In this approach, the
negative KLdivergence between the baseline and the adapted model is
added to the maximum mutual information (MMI) as regularization in
the sequencelevel adaptation.
We compared eight
different adaptation setups specified by the baseline training criterion,
the adaptation criterion, and the regularization methodology. We found
that the proposed sequencelevel adaptation consistently outperforms
the crossentropy adaptation. For both of them, regularization is critical.
We further introduced a unified formulation in which the regularized
CE and SEQ adaptation are the special cases.
We applied the proposed
approach to speaker adaptation and accent adaptation in a mobile short
message dictation task. For the speaker adaptation, with 25 or 100
utterances, the proposed approach yields 13.72% or 23.18% WER reduction
when adapting from the CE baseline, comparing to 11.87% or 20.18% for
the CE adaptation. For the accent adaptation, with 1K utterances, the
proposed approach yields 18.74% or 19.50% WER reduction when adapting
from the CEDNN or the SEQDNN. The WER reduction using the regularized
CE adaptation is 15.98% and 15.69%, respectively.
Bibliographic reference. Huang, Yan / Gong, Yifan (2015): "Regularized sequencelevel deep neural network model adaptation", In INTERSPEECH2015, 10811085.