7th International Conference on Spoken Language Processing
September 16-20, 2002
In this paper, we show how speech recognition systems can be improved, using an adaptive model transition penalty term in the Viterbi decoding process. This term is calculated using the phonemic segmentation of the speech signal, where a bi-directional recurrent neural network is used to segment the speech into phonemes. No higher level lexical knowledge (phoneme sequence) is used in the segmentation process. The method is compared to an existing technique, on the state-of-the-art speech recognition system, HTK. It is shown that our technique results in significantly better phoneme recognition accuracy.
Bibliographic reference. Thirion, J. W. F. / Botha, Elizabeth C. (2002): "Recurrent neural network-enhanced HMM speech recognition systems", In ICSLP-2002, 2637-2640.