16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

A General Artificial Neural Network Extension for HTK

C. Zhang, Philip C. Woodland

University of Cambridge, UK

This paper describes the recently developed artificial neural network (ANN) modules in HTK hidden Markov model toolkit, which enables ANN models with very general feed-forward architectures to be used for either acoustic modelling or feature extraction. The HTK ANN extension includes many recent ANN-based speech processing techniques, such as sequence training, model stacking, speaker adaptation, and parameterised activation functions. The implementation allows efficient training by supporting GPUs and various types of data cache. The ANN modules are fully integrated into the rest of the HTK toolkit, which allows existing GMM-HMM methods to be easily used in the ANN-HMM framework. Speech recognition results on a 300 hours DARPA BOLT conversational Mandarin task show that HTK can produce tandem and hybrid systems with state-of-the-art performance on this very challenging task. Furthermore, the flexibility of the implementation is illustrated using demo systems for a Wall Street Journal (WSJ) task. The HTK ANN extension is planned for release in HTK version 3.5.

Full Paper

Bibliographic reference.  Zhang, C. / Woodland, Philip C. (2015): "A general artificial neural network extension for HTK", In INTERSPEECH-2015, 3581-3585.