ISCA Archive Eurospeech 1991
ISCA Archive Eurospeech 1991

Time-delay neural network architectures for high-performance speaker-independent recognition

Hidefumi Sawai, Satoru Nakamura

Several Time-Delay Neural Network(TDNN) architectures applied to speaker-dependent and multi-speaker's phoneme recognition are compared with respect to their capabilities on a speaker-independent phoneme recognition problem. Phoneme experiments for recognizing voiced stops /b, d, g/ using six and twelve training speakers showed high average recognition rates of 91. 3% and 93. 6%, respectively for eight test speakers. In addition, constructing networks by speakers' modules is effective in terms of saving training time, and leads to higher recognition performance than a single structure of TDNN with comparable network capacity. Furthermore, we propose an extended architecture for recognizing all phonemes based on the achievements in this paper.


doi: 10.21437/Eurospeech.1991-242

Cite as: Sawai, H., Nakamura, S. (1991) Time-delay neural network architectures for high-performance speaker-independent recognition. Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991), 1011-1014, doi: 10.21437/Eurospeech.1991-242

@inproceedings{sawai91_eurospeech,
  author={Hidefumi Sawai and Satoru Nakamura},
  title={{Time-delay neural network architectures for high-performance speaker-independent recognition}},
  year=1991,
  booktitle={Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991)},
  pages={1011--1014},
  doi={10.21437/Eurospeech.1991-242}
}