15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Joint Sequence Training of Phone and Grapheme Acoustic Model Based on Multi-Task Learning Deep Neural Networks

Dongpeng Chen (1), Brian Mak (1), Sunil Sivadas (2)

(1) HKUST, China
(2) A*STAR, Singapore

Multi-task learning (MTL) can be an effective way to improve the generalization performance of singly learning tasks if the tasks are related, especially when the amount of training data is small. Our previous work applied MTL to the joint training of triphone and trigrapheme acoustic models using deep neural networks (DNNs) for low-resource speech recognition. Significant recognition improvement over the performance of their DNNs trained by single-task learning (STL) was obtained. In that work, both STL-DNNs and MTL-DNNs were trained by minimizing the total frame-wise cross entropies. Since phoneme and grapheme recognition are inherently sequence classification tasks, here we study the effect of sequence-discriminative training on their joint estimation using MTL-DNNs. Experimental evaluation on TIMIT phoneme recognition shows that joint sequence training outperforms frame-wise training of phone and grapheme MTL-DNNs significantly.

Full Paper

Bibliographic reference.  Chen, Dongpeng / Mak, Brian / Sivadas, Sunil (2014): "Joint sequence training of phone and grapheme acoustic model based on multi-task learning deep neural networks", In INTERSPEECH-2014, 1083-1087.