Multi-State Time Delay Neural Networks (MS-TDNNs), using a new connectionist architecture with embedded time alignement, have been successfully applied to speaker-dependent continuous spoken letter recognition[lj. This shows the value of extending the classification capabilities of connectionist networks up to the word level in recognizing confusable vocabularies. This paper describes the application of MS-TDNNs to a very different task; speaker independent telephone-quality isolated digit recognition. The resulting 1. 6% error rate demonstrates the value of embedded time alignement, since multi-feature TDNNs, which do not implement time alignement, have a 6. 5% error rate on the same task. Comparisons with HMMs are also provided.
Bibliographic reference. Haffner, Patrick / Waibel, Alex H. (1991): "Time-delay neural networks embedding time alignment: a performance analysis", In EUROSPEECH-1991, 1415-1418.