ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Finding recurrent out-of-vocabulary words

Long Qin, Alexander Rudnicky

Out-of-vocabulary (OOV) words can appear more than once in a conversation or over a period of time. Such multiple instances of the same OOV word provide valuable information for estimating the pronunciation or the part-of-speech (POS) tag of the word. But in a conventional OOV word detection system, each OOV word is recognized and treated individually. We therefore investigated how to identify recurrent OOV words in speech recognition. Specifically, we propose to cluster multiple instances of the same OOV word using a bottom-up approach. Phonetic, acoustic and contextual features were collected to measure the distance between OOV candidates. The experimental results show that the bottom-up clustering approach is very effective at detecting the recurrence of OOV words. We also found that the phonetic feature is better than the acoustic and contextual features, and the best performance is achieved when combining all features.


doi: 10.21437/Interspeech.2013-527

Cite as: Qin, L., Rudnicky, A. (2013) Finding recurrent out-of-vocabulary words. Proc. Interspeech 2013, 2242-2246, doi: 10.21437/Interspeech.2013-527

@inproceedings{qin13_interspeech,
  author={Long Qin and Alexander Rudnicky},
  title={{Finding recurrent out-of-vocabulary words}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={2242--2246},
  doi={10.21437/Interspeech.2013-527}
}