It has been demonstrated that the speech recognition performance can be improved by adding extra articulatory information, and subsequently, how to use such information effectively becomes a challenging problem. In this paper, we propose an attribute-based knowledge integration architecture which is realized by modeling and learning both acoustic and articulatory cues simultaneously in a uniform framework. The framework promotes the performance by providing attribute-based knowledge in both feature and model domains. In model domain, the attribute classification is used as the secondary task to improve the performance of an MTL-DNN used for speech recognition by lifting the discriminative ability on pronunciation. In feature domain, an attribute-based feature is extracted from an MTL-DNN trained with attribute classification as its primary task and phonetic/tri-phone state classification as the secondary task. Experiments on TIMIT and WSJ corpuses show that the proposed framework achieves significant performance improvements compared with the baseline DNN-HMM systems.
Cite as: Zheng, H., Yang, Z., Qiao, L., Li, J., Liu, W. (2015) Attribute knowledge integration for speech recognition based on multi-task learning neural networks. Proc. Interspeech 2015, 543-547, doi: 10.21437/Interspeech.2015-200
@inproceedings{zheng15_interspeech, author={Hao Zheng and Zhanlei Yang and Liwei Qiao and Jianping Li and Wenju Liu}, title={{Attribute knowledge integration for speech recognition based on multi-task learning neural networks}}, year=2015, booktitle={Proc. Interspeech 2015}, pages={543--547}, doi={10.21437/Interspeech.2015-200} }