14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Unsupervised Topic Adaptation for Morph-Based Speech Recognition

André Mansikkaniemi, Mikko Kurimo

Aalto University, Finland

Topic adaptation in automatic speech recognition (ASR) refers to the adaptation of language model and vocabulary for improved recognition of in-domain speech data. In this work we implement unsupervised topic adaptation for morph-based ASR, to improve recognition of foreign entity names. Based on first-pass ASR hypothesis similar texts are selected from a collection of articles, which are used to adapt the background language model. Latent semantic indexing is used to index the adaptation corpus and ASR output. We evaluate three different types of index terms and their usefulness in unsupervised LM adaptation: statistical morphs, words, and a combination of morphs and words. Furthermore, we implement vocabulary adaptation alongside unsupervised LM adaptation. Foreign word candidates are selected from the in-domain texts, based on how likely they are topic-related foreign entity names. Adapted pronunciation rules are generated for the selected foreign words. Morpheme adaptation is also performed by restoring over-segmented foreign words back into their base forms, to ensure more reliable pronunciation modeling.

Full Paper

Bibliographic reference.  Mansikkaniemi, André / Kurimo, Mikko (2013): "Unsupervised topic adaptation for morph-based speech recognition", In INTERSPEECH-2013, 2693-2697.