This paper describes evaluation results and a new structure of Time-Delay Neural Networks (TDNN) for speaker-independent and context-independent phoneme recognition. The proposed new structure is based on the integration of TDNNs which have several TDNNs separated according to the duration of phonemes, so that it deals with phonemes of varying duration more effectively. In the experimental evaluation of the proposed new structure, 16-English vowel recognition was performed using 5268 vowel tokens picked from 480 sentences spoken by 140 speakers (98 males and 42 females) on the TIMIT (TI-MIT) database. A 60. 5% recognition rate, which was improved from 56% in the single TDNN structure, and stability improvement of recognition rate showed the effectiveness of the proposed integrated TDNNs.
Cite as: Hataoka, N., Waibel, A.H. (1991) Evaluation of speaker-independent phoneme recognition on TIMIT database using TDNNs. Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991), 105-108, doi: 10.21437/Eurospeech.1991-22
@inproceedings{hataoka91_eurospeech, author={Nobuo Hataoka and Alex H. Waibel}, title={{Evaluation of speaker-independent phoneme recognition on TIMIT database using TDNNs}}, year=1991, booktitle={Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991)}, pages={105--108}, doi={10.21437/Eurospeech.1991-22} }