8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Adapting Language Models for Frequent Fixed Phrases by Emphasizing N-Gram Subsets

Tomoyosi Akiba (1), Katunobu Itou (1), Atsushi Fujii (2)

(1) AIST, Japan
(2) University of Tsukuba, Japan

In support of speech-driven question answering, we propose a method to construct N-gram language models for recognizing spoken questions with high accuracy. Question-answering systems receive queries that often consist of two parts: one conveys the query topic and the other is a fixed phrase used in query sentences. A language model constructed by using a target collection of QA, for example, newspaper articles, can model the former part, but cannot model the latter part appropriately. We tackle this problem as task adaptation from language models obtained from background corpora (e.g., newspaper articles) to the fixed phrases, and propose a method that does not use the task-specific corpus, which is often difficult to obtain, but instead uses only manually listed fixed phrases. The method emphasizes a subset of N-grams obtained from a background corpus that corresponds to fixed phrases specified by the list. Theoretically, this method can be regarded as maximizing a posteriori probability (MAP) estimation using the subset of the N-grams as a posteriori distribution. Some experiments show the effectiveness of our method.

Full Paper

Bibliographic reference.  Akiba, Tomoyosi / Itou, Katunobu / Fujii, Atsushi (2003): "Adapting language models for frequent fixed phrases by emphasizing n-gram subsets", In EUROSPEECH-2003, 1469-1472.