We describe a new method of modeling duration at word level. These duration models are easily trained from the acoustic training data and can be used to rescore N-best lists of recognition hypotheses. The models capture some of the well known durational effects such as prepausal lengthening. They incorporate a simple back off mechanism to handle unseen words during rescoring. Experiments with various large vocabulary conversational speech recognition (LVCSR) evaluation sets showed consistent improvements of 0.7-1.0% in word error rate (WER).
Cite as: Gadde, V.R.R. (2000) Modeling word durations. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, 601-604, doi: 10.21437/ICSLP.2000-149
@inproceedings{gadde00_icslp, author={Venkata Ramana Rao Gadde}, title={{Modeling word durations}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 1, 601-604}, doi={10.21437/ICSLP.2000-149} }