We propose a regularized sequence-level (SEQ) deep neural network (DNN)
model adaptation methodology as an extension of the previous KL-divergence
regularized cross-entropy (CE) adaptation . In this approach, the
negative KL-divergence between the baseline and the adapted model is
added to the maximum mutual information (MMI) as regularization in
the sequence-level adaptation.
We compared eight different adaptation setups specified by the baseline training criterion, the adaptation criterion, and the regularization methodology. We found that the proposed sequence-level adaptation consistently outperforms the cross-entropy adaptation. For both of them, regularization is critical. We further introduced a unified formulation in which the regularized CE and SEQ adaptation are the special cases.
We applied the proposed approach to speaker adaptation and accent adaptation in a mobile short message dictation task. For the speaker adaptation, with 25 or 100 utterances, the proposed approach yields 13.72% or 23.18% WER reduction when adapting from the CE baseline, comparing to 11.87% or 20.18% for the CE adaptation. For the accent adaptation, with 1K utterances, the proposed approach yields 18.74% or 19.50% WER reduction when adapting from the CE-DNN or the SEQ-DNN. The WER reduction using the regularized CE adaptation is 15.98% and 15.69%, respectively.
Bibliographic reference. Huang, Yan / Gong, Yifan (2015): "Regularized sequence-level deep neural network model adaptation", In INTERSPEECH-2015, 1081-1085.