7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Recurrent Neural Network-Enhanced HMM Speech Recognition Systems

J. W. F. Thirion, Elizabeth C. Botha

University of Pretoria, South Africa

In this paper, we show how speech recognition systems can be improved, using an adaptive model transition penalty term in the Viterbi decoding process. This term is calculated using the phonemic segmentation of the speech signal, where a bi-directional recurrent neural network is used to segment the speech into phonemes. No higher level lexical knowledge (phoneme sequence) is used in the segmentation process. The method is compared to an existing technique, on the state-of-the-art speech recognition system, HTK. It is shown that our technique results in significantly better phoneme recognition accuracy.


Full Paper

Bibliographic reference.  Thirion, J. W. F. / Botha, Elizabeth C. (2002): "Recurrent neural network-enhanced HMM speech recognition systems", In ICSLP-2002, 2637-2640.