Multi-State Time Delay Neural Networks (MS-TDNNs), using a new connectionist architecture with embedded time alignement, have been successfully applied to speaker-dependent continuous spoken letter recognition[lj. This shows the value of extending the classification capabilities of connectionist networks up to the word level in recognizing confusable vocabularies. This paper describes the application of MS-TDNNs to a very different task; speaker independent telephone-quality isolated digit recognition. The resulting 1. 6% error rate demonstrates the value of embedded time alignement, since multi-feature TDNNs, which do not implement time alignement, have a 6. 5% error rate on the same task. Comparisons with HMMs are also provided.
Cite as: Haffner, P., Waibel, A.H. (1991) Time-delay neural networks embedding time alignment: a performance analysis. Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991), 1415-1418, doi: 10.21437/Eurospeech.1991-144
@inproceedings{haffner91_eurospeech, author={Patrick Haffner and Alex H. Waibel}, title={{Time-delay neural networks embedding time alignment: a performance analysis}}, year=1991, booktitle={Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991)}, pages={1415--1418}, doi={10.21437/Eurospeech.1991-144} }