First International Conference on Spoken Language Processing (ICSLP 90)
This paper proposes a new phoneme-based speech recognition approach using neural networks trained to recognize sub-phonemes. The sub-phoneme is an acoustic unit which is shorter than a phoneme. The sub-phoneme recognition neural networks exhibit a more precise firing pattern and smaller firing gaps around phoneme boundaries than conventional phoneme recognition neural networks. The word or sentence score is given by the normalized highest sum of the output neuron firing score, which is obtained by the Dynamic Time Warping (DTW) algorithm. A Time Delay Neural Network (TDNN) structure is employed for the sub-phoneme recognizer. The proposed method has been evaluated through word recognition using a continuous speech database. The results show that the recognition rate greatly improves when the sub-phoneme is introduced as a recognition unit. The best word recognition rate is obtained when a phoneme period is divided into front and rear sub-phonemes. The recognition rate is further improved by introducing a multiple entry word dictionary.
Bibliographic reference. Aikawa, Kiyoaki / Waibel, Alexander H. (1990): "Speech recognition using sub-phoneme recognition neural network", In ICSLP-1990, 685-688.