INTERSPEECH 2015
16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Regularized Sequence-Level Deep Neural Network Model Adaptation

Yan Huang, Yifan Gong

Microsoft, USA

We propose a regularized sequence-level (SEQ) deep neural network (DNN) model adaptation methodology as an extension of the previous KL-divergence regularized cross-entropy (CE) adaptation [1]. In this approach, the negative KL-divergence between the baseline and the adapted model is added to the maximum mutual information (MMI) as regularization in the sequence-level adaptation.
    We compared eight different adaptation setups specified by the baseline training criterion, the adaptation criterion, and the regularization methodology. We found that the proposed sequence-level adaptation consistently outperforms the cross-entropy adaptation. For both of them, regularization is critical. We further introduced a unified formulation in which the regularized CE and SEQ adaptation are the special cases.
    We applied the proposed approach to speaker adaptation and accent adaptation in a mobile short message dictation task. For the speaker adaptation, with 25 or 100 utterances, the proposed approach yields 13.72% or 23.18% WER reduction when adapting from the CE baseline, comparing to 11.87% or 20.18% for the CE adaptation. For the accent adaptation, with 1K utterances, the proposed approach yields 18.74% or 19.50% WER reduction when adapting from the CE-DNN or the SEQ-DNN. The WER reduction using the regularized CE adaptation is 15.98% and 15.69%, respectively.

Full Paper

Bibliographic reference.  Huang, Yan / Gong, Yifan (2015): "Regularized sequence-level deep neural network model adaptation", In INTERSPEECH-2015, 1081-1085.