In this paper we investigate the error criteria that are optimized during the training of artificial neural networks (ANN). We compare the bounds of the squared error (SE) and the cross-entropy (CE) criteria being the most popular choices in state-of-the-art implementations. The evaluation is performed on automatic speech recognition (ASR) and handwriting recognition (HWR) tasks using a hybrid HMM-ANN model. We find that with randomly initialized weights, the squared error based ANN does not converge to a good local optimum. However, with a good initialization by pre-training, the word error rate of our best CE trained system could be reduced from 30.9% to 30.5% on the ASR, and from 22.7% to 21.9% on the HWR task by performing a few additional "fine-tuning" iterations with the SE criterion.
Bibliographic reference. Golik, Pavel / Doetsch, Patrick / Ney, Hermann (2013): "Cross-entropy vs. squared error training: a theoretical and experimental comparison", In INTERSPEECH-2013, 1756-1760.