Although most parameters in a speech recognition system are estimated from data, the unit inventory and lexicon are generally hand crafted and therefore unlikely to be optimal. This paper describes a joint solution to the problems of learning a unit inventory and corresponding lexicon from data. The methodology, which requires multiple training tokens per word, is then extended to handle infrequently observed words using a hybrid system that combines automatically-derived units with phone-based units. The hybrid system outperforms a phone-based system in first-pass decoding experiments on a large vocabulary conversational speech recognition task.
Cite as: Bacchiani, M., Ostendorf, M. (1998) Using automatically-derived acoustic sub-word units in large vocabulary speech recognition. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0586, doi: 10.21437/ICSLP.1998-629
@inproceedings{bacchiani98_icslp, author={Michiel Bacchiani and Mari Ostendorf}, title={{Using automatically-derived acoustic sub-word units in large vocabulary speech recognition}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0586}, doi={10.21437/ICSLP.1998-629} }