![]() |
INTERSPEECH 2015
|
![]() |
We propose a regularized sequence-level (SEQ) deep neural network (DNN)
model adaptation methodology as an extension of the previous KL-divergence
regularized cross-entropy (CE) adaptation [1]. In this approach, the
negative KL-divergence between the baseline and the adapted model is
added to the maximum mutual information (MMI) as regularization in
the sequence-level adaptation.
We compared eight
different adaptation setups specified by the baseline training criterion,
the adaptation criterion, and the regularization methodology. We found
that the proposed sequence-level adaptation consistently outperforms
the cross-entropy adaptation. For both of them, regularization is critical.
We further introduced a unified formulation in which the regularized
CE and SEQ adaptation are the special cases.
We applied the proposed
approach to speaker adaptation and accent adaptation in a mobile short
message dictation task. For the speaker adaptation, with 25 or 100
utterances, the proposed approach yields 13.72% or 23.18% WER reduction
when adapting from the CE baseline, comparing to 11.87% or 20.18% for
the CE adaptation. For the accent adaptation, with 1K utterances, the
proposed approach yields 18.74% or 19.50% WER reduction when adapting
from the CE-DNN or the SEQ-DNN. The WER reduction using the regularized
CE adaptation is 15.98% and 15.69%, respectively.
Bibliographic reference. Huang, Yan / Gong, Yifan (2015): "Regularized sequence-level deep neural network model adaptation", In INTERSPEECH-2015, 1081-1085.