This paper describes preliminary recognition experiments on PhoneBook, a corpus of isolated, telephone-bandwidth, read words from a large (almost 8,000-word) vocabulary. We have chosen this corpus as a testbed for experiments on the language model-independent parts of a segment-based recognizer. We present results showing that a segment-based recognizer performs well on this task, and that a simple Gaussian mixture phone duration model significantly reduces the error rate. We compare context-independent, stress-dependent, and word position-dependent duration models and obtain relative error rate reductions of up to 12% on the test set. Finally, we make some observations regarding the effects of stress and word position in this isolated-word task and discuss our plans for further research using PhoneBook.
Cite as: Livescu, K., Glass, J. (2001) Segment-based recognition on the phonebook task: initial results and observations on duration modeling. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 1437-1440, doi: 10.21437/Eurospeech.2001-23
@inproceedings{livescu01_eurospeech, author={Karen Livescu and James Glass}, title={{Segment-based recognition on the phonebook task: initial results and observations on duration modeling}}, year=2001, booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)}, pages={1437--1440}, doi={10.21437/Eurospeech.2001-23} }