11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Similarity Scoring for Recognizing Repeated Out-of-Vocabulary Words

Mirko Hannemann, Stefan Kombrink, Martin Karafiát, Lukáš Burget

Brno University of Technology, Czech Republic

We develop a similarity measure to detect repeatedly occurring Out-of-Vocabulary words (OOV), since these carry important information. Sub-word sequences in the recognition output from a hybrid word/sub-word recognizer are taken as detected OOVs and are aligned to each other with the help of an alignment error model. This model is able to deal with partial OOV detections and tries to reveal more complex word relations such as compound words. We apply the model to a selection of conversational phone calls to retrieve other examples of the same OOV, and to obtain a higher-level description of it such as being a derivation of a known word.

Full Paper

Bibliographic reference.  Hannemann, Mirko / Kombrink, Stefan / Karafiát, Martin / Burget, Lukáš (2010): "Similarity scoring for recognizing repeated out-of-vocabulary words", In INTERSPEECH-2010, 897-900.