September 22-25, 1997
This paper describes an N-gram language model adaptation technique. As an N-gram model requires a large size sample corpus for probability estimation, it is difficult to utilize N-gram model for a specific small task. In this paper, N-gram task adaptation is proposed using large corpus of the general task (TI text) and small corpus of the specific task (AD text). A simple weighting is employed to mix TI and AD text. In addition to mix two texts, the effect of vocabulary is also investigated. The experimental results show that adapted N-gram model with proper vocabulary size has significantly lower perplexity than the task independent models.
Bibliographic reference. Ito, Akinori / Saitoh, Hideyuki / Katoh, Masaharu / Kohda, Masaki (1997): "N-gram language model adaptation using small corpus for spoken dialog recognition", In EUROSPEECH-1997, 2735-2738.