Sixth International Conference on Spoken Language Processing (ICSLP 2000)

Beijing, China
October 16-20, 2000

Application of Pattern Recognition Neural Network Model to Hearing System for Continuous Speech

Tetsuro Kitazoe, Tomoyuki Ichiki, Makoto Funamori

Department of Computer Science and Systems Engineering, Faculty of Engineering, Miyazaki University, Japan

The two or three layered networks 2LNN, 3LNN which originate from stereovision neural network are applied to speech recognition. To accommodate sequential data flow, we consider a window to which new acoustic data enter and from which final neural activities are output. Inside the window recurrent neural network develops neural activity toward a stable point. The process is called Winner-Take-All(WTA) with cooperation and competition. The resulting neural activities clearly showed recognition of a continuous speech of a word. The string of phonemes obtained is compared with reference words by using dynamical programming method. The resulting recognition rate amounts to 96.7% for 100 words spoken by 9 male speakers, which is compared to 97.9% by hidden markov model (HMM) with three states and single gaussian distribution. The present results which are close to those of HMM seem noticeable because the architecture of the neural network is very simple and parameters in the neural net equations are small numbered and always fixed.


Full Paper

Bibliographic reference.  Kitazoe, Tetsuro / Ichiki, Tomoyuki / Funamori, Makoto (2000): "Application of pattern recognition neural network model to hearing system for continuous speech", In ICSLP-2000, vol.1, 293-296.