This paper investigates the impact of word classing on a recently proposed shrinkage-based language model, Model M. Model M, a class-based n-gram model, has been shown to significantly outperform word-based n-gram models on a variety of domains. In past work, word classes for Model M were induced automatically from unlabeled text using the algorithm of Brown et. al. We take a closer look at the classing and attempt to find out whether improved classing would also translate to improved performance. In particular, we explore the use of manually-assigned classes, part-of-speech (POS) tags, and dialog state information, considering both hard classing and soft classing. In experiments with a conversational dialog system (human--machine dialog) and a speech-to-speech translation system (human--human dialog), we find that better classing can improve Model M performance by up to 3% absolute in word-error rate.
Bibliographic reference. Sarikaya, Ruhi / Chen, Stanley F. / Sethy, Abhinav / Ramabhadran, Bhuvana (2010): "Impact of word classing on shrinkage-based language models", In INTERSPEECH-2010, 1804-1807.