11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Investigation of Full-Sequence Training of Deep Belief Networks for Speech Recognition

Abdel-rahman Mohamed (1), Dong Yu (2), L. Deng (2)

(1) University of Toronto, Canada
(2) Microsoft Research, USA

Recently, Deep Belief Networks (DBNs) have been proposed for phone recognition and were found to achieve highly competitive performance. In the original DBNs, only frame-level information was used for training DBN weights while it has been known for long that sequential or full-sequence information can be helpful in improving speech recognition accuracy. In this paper we investigate approaches to optimizing the DBN weights, state-to-state transition parameters, and language model scores using the sequential discriminative training criterion. We describe and analyze the proposed training algorithm and strategy, and discuss practical issues and how they affect the final results. We show that the DBNs learned using the sequence-based training criterion outperforms that with frame-based criterion on three-layer DBNs and explain why the gain vanishes on six-layer DBNs, when evaluated on TIMIT.

Full Paper

Bibliographic reference.  Mohamed, Abdel-rahman / Yu, Dong / Deng, L. (2010): "Investigation of full-sequence training of deep belief networks for speech recognition", In INTERSPEECH-2010, 2846-2849.