INTERSPEECH 2009
10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Towards Using Hybrid Word and Fragment Units for Vocabulary Independent LVCSR Systems

Ariya Rastrow (1), Abhinav Sethy (2), Bhuvana Ramabhadran (2), Frederick Jelinek (1)

(1) Johns Hopkins University, USA
(2) IBM T.J. Watson Research Center, USA

This paper presents the advantages of augmenting a word-based system with sub-word units as a step towards building open vocabulary speech recognition systems. We show that a hybrid system which combines words and data-driven, variable length sub word units has a better phone accuracy than word only systems. In addition the hybrid system is better in detecting Out-Of-Vocabulary (OOV) terms and representing them phonetically. Results are presented on the RT-04 broadcast news and MIT Lecture data sets. An FSM-based approach to recover OOV words from the hybrid lattices is also presented. At an OOV rate of 2.5% on RT-04 we observed a 8% relative improvement in phone error rate (PER), 7.3% relative improvement in oracle PER and 7% relative improvement in WER after recovering the OOV terms. A significant reduction of 33% relative in PER is seen in the OOV regions.

Full Paper

Bibliographic reference.  Rastrow, Ariya / Sethy, Abhinav / Ramabhadran, Bhuvana / Jelinek, Frederick (2009): "Towards using hybrid word and fragment units for vocabulary independent LVCSR systems", In INTERSPEECH-2009, 1931-1934.