We formulate a generalized hybrid HMM-NN training procedure using the full-sum over the hidden state-sequence and identify CTC as a special case of it. We present an analysis of the alignment behavior of such a training procedure and explain the strong localization of label output behavior of full-sum training (also referred to as peaky or spiky behavior). We show how to avoid that behavior by using a state prior. We discuss the temporal decoupling between output label position/time-frame, and the corresponding evidence in the input observations when this is trained with BLSTM models. We also show a way how to overcome this by jointly training a FFNN. We implemented the Baum-Welch alignment algorithm in CUDA to be able to do fast soft realignments on GPU. We have published this code along with some of our experiments as part of RETURNN, RWTH’s extensible training framework for universal recurrent neural networks. We finish with experimental validation of our study on WSJ and Switchboard.
Cite as: Zeyer, A., Beck, E., Schlüter, R., Ney, H. (2017) CTC in the Context of Generalized Full-Sum HMM Training. Proc. Interspeech 2017, 944-948, doi: 10.21437/Interspeech.2017-1073
@inproceedings{zeyer17_interspeech, author={Albert Zeyer and Eugen Beck and Ralf Schlüter and Hermann Ney}, title={{CTC in the Context of Generalized Full-Sum HMM Training}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={944--948}, doi={10.21437/Interspeech.2017-1073} }