In this paper, we describe the result of the introduction of a prediction layer and a similarity index in the phoneme recognition experiments based on a recurrent neural network. The proposed network has the prediction layer and the recognition layer in the output layer. The prediction layer predicts a next input vector from the present input vectors, and the recognition layer classifies them. The purpose of the prediction layer is to transfer a contextual information to the network. The activation of recognition layer is multiplied by a cosine value of angle made between the predicted vector and the actual input vector every time. We call this cosine value the similarity index. When the predicted vector is different from the actual input vector, the output of recognition layer becomes smaller, because of the multiplication of the similarity index, so that we avoid an incorrect classification of the recognition layer. Keywords: phoneme recognition, recurrent neural network, prediction layer, similarity index, contextual information
Cite as: Fukuda, Y., Matsumoto, H. (1991) Phoneme recognition using recurrent neural networks. Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991), 1419-1423, doi: 10.21437/Eurospeech.1991-145
@inproceedings{fukuda91_eurospeech, author={Yohji Fukuda and Haruya Matsumoto}, title={{Phoneme recognition using recurrent neural networks}}, year=1991, booktitle={Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991)}, pages={1419--1423}, doi={10.21437/Eurospeech.1991-145} }