Vocal Tract Length Normalization (VTLN) has been shown to be an efficient speaker normalization tool for HMM based systems. In this paper we show that it is equally efficient for a template based recognition system. Template based systems, while promising, have as potential drawback that templates maintain all non phonetic details apart from the essential phonemic properties; i.e. they retain information on speaker and acoustic recording circumstances. This may lead to a very inefficient usage of the database. We show that after VTLN significantly more speakers also from opposite gender contribute templates to the matching sequence compared to the non-normalized case. In experiments on the Wall Street Journal database this leads to a relative word error rate reduction of 10%.
Cite as: Demange, S., Compernolle, D.V. (2009) Speaker normalization for template based speech recognition. Proc. Interspeech 2009, 560-563, doi: 10.21437/Interspeech.2009-200
@inproceedings{demange09_interspeech, author={Sébastien Demange and Dirk Van Compernolle}, title={{Speaker normalization for template based speech recognition}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={560--563}, doi={10.21437/Interspeech.2009-200} }