This paper presents the advantages of augmenting a word-based system with sub-word units as a step towards building open vocabulary speech recognition systems. We show that a hybrid system which combines words and data-driven, variable length sub word units has a better phone accuracy than word only systems. In addition the hybrid system is better in detecting Out-Of-Vocabulary (OOV) terms and representing them phonetically. Results are presented on the RT-04 broadcast news and MIT Lecture data sets. An FSM-based approach to recover OOV words from the hybrid lattices is also presented. At an OOV rate of 2.5% on RT-04 we observed a 8% relative improvement in phone error rate (PER), 7.3% relative improvement in oracle PER and 7% relative improvement in WER after recovering the OOV terms. A significant reduction of 33% relative in PER is seen in the OOV regions.
Cite as: Rastrow, A., Sethy, A., Ramabhadran, B., Jelinek, F. (2009) Towards using hybrid word and fragment units for vocabulary independent LVCSR systems. Proc. Interspeech 2009, 1931-1934, doi: 10.21437/Interspeech.2009-558
@inproceedings{rastrow09_interspeech, author={Ariya Rastrow and Abhinav Sethy and Bhuvana Ramabhadran and Frederick Jelinek}, title={{Towards using hybrid word and fragment units for vocabulary independent LVCSR systems}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={1931--1934}, doi={10.21437/Interspeech.2009-558} }