8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Online Vocabulary Adaptation Using Limited Adaptation Data

C. E. Liu, K. Thambiratnam, F. Seide

Microsoft Research Asia, China

This paper presents a study of low-latency domain-independent online vocabulary adaptation using limited amounts of supporting text data. The target applications include blind indexing of Internet content, indexing of new content with low latency, and domains where Out-Of-Vocabulary (OOV) words are problematic. A number of methods to perform document-specific adaptation using a small amount of support metadata and the Internet are examined. It is shown that a combination of word feature fusion and cross-file statistics pooling provides robust adaptation. The best evaluated method achieved an absolute reduction of 27.6% in OOV detection false alarm rate over the baseline word feature thresholding methods.

Full Paper

Bibliographic reference.  Liu, C. E. / Thambiratnam, K. / Seide, F. (2007): "Online vocabulary adaptation using limited adaptation data", In INTERSPEECH-2007, 1821-1824.