Improved Neural Network Initialization by Grouping Context-Dependent Targets for Acoustic Modeling

Gakuto Kurata, Brian Kingsbury


Neural Network (NN) Acoustic Models (AMs) are usually trained using context-dependent Hidden Markov Model (CD-HMM) states as independent targets. For example, the CD-HMM states of A-b-2 (second variant of beginning state of A) and A-m-1 (first variant of middle state of A) both correspond to the phone A, and A-b-1 and A-b-2 both correspond to the Context-independent HMM (CI-HMM) state A-b, but this relationship is not explicitly modeled. We propose a method that treats some neurons in the final hidden layer just below the output layer as dedicated neurons for phones or CI-HMM states by initializing connections between the dedicated neurons and the corresponding CD-HMM outputs with stronger weights than to other outputs. We obtained 6.5% and 3.6% relative error reductions with a DNN AM and a CNN AM, respectively, on a 50-hour English broadcast news task and 4.6% reduction with a CNN AM on a 500-hour Japanese task, in all cases after Hessian-free sequence training. Our proposed method only changes the NN parameter initialization and requires no additional computation in NN training or speech recognition run-time.


DOI: 10.21437/Interspeech.2016-725

Cite as

Kurata, G., Kingsbury, B. (2016) Improved Neural Network Initialization by Grouping Context-Dependent Targets for Acoustic Modeling. Proc. Interspeech 2016, 27-31.

Bibtex
@inproceedings{Kurata+2016,
author={Gakuto Kurata and Brian Kingsbury},
title={Improved Neural Network Initialization by Grouping Context-Dependent Targets for Acoustic Modeling},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-725},
url={http://dx.doi.org/10.21437/Interspeech.2016-725},
pages={27--31}
}