11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Deep-Structured Hidden Conditional Random Fields for Phonetic Recognition

Dong Yu, L. Deng

Microsoft Research, USA

We extend our earlier work on deep-structured conditional random field (DCRF) and develop deep-structured hidden conditional random field (DHCRF). We investigate the use of this new sequential deep-learning model for phonetic recognition. DHCRF is a hierarchical model in which the final layer is a hidden conditional random field (HCRF) and the intermediate layers are zero-th-order conditional random fields (CRFs). Parameter estimation and sequence inference in the DHCRF are carried out layer by layer. Note that the training label is available only at the final layer and the state boundary is unknown. This difficulty is addressed by using unsupervised learning for the intermediate layers and lattice-based supervised learning in the final layer. Experiments on the TIMIT phone recognition task show small performance improvement of a three-layer DHCRF over a two-layer DHCRF, both are superior to the single-layer DHCRF and the discriminatively trained tri-phone HMM with same features.

Full Paper

Bibliographic reference.  Yu, Dong / Deng, L. (2010): "Deep-structured hidden conditional random fields for phonetic recognition", In INTERSPEECH-2010, 2986-2989.