Several Time-Delay Neural Network(TDNN) architectures applied to speaker-dependent and multi-speaker's phoneme recognition are compared with respect to their capabilities on a speaker-independent phoneme recognition problem. Phoneme experiments for recognizing voiced stops /b, d, g/ using six and twelve training speakers showed high average recognition rates of 91. 3% and 93. 6%, respectively for eight test speakers. In addition, constructing networks by speakers' modules is effective in terms of saving training time, and leads to higher recognition performance than a single structure of TDNN with comparable network capacity. Furthermore, we propose an extended architecture for recognizing all phonemes based on the achievements in this paper.
Cite as: Sawai, H., Nakamura, S. (1991) Time-delay neural network architectures for high-performance speaker-independent recognition. Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991), 1011-1014, doi: 10.21437/Eurospeech.1991-242
@inproceedings{sawai91_eurospeech, author={Hidefumi Sawai and Satoru Nakamura}, title={{Time-delay neural network architectures for high-performance speaker-independent recognition}}, year=1991, booktitle={Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991)}, pages={1011--1014}, doi={10.21437/Eurospeech.1991-242} }