ISCA Archive ISCSLP 2008
ISCA Archive ISCSLP 2008

Position Information for Language Modeling in Speech Recognition

Hsuan-Sheng Chiu, Guan-Yu Chen, Chun-Jen Lee, Berlin Chen

This paper considers word position information for language modeling. For organized documents, such as technical papers or news reports, the composition and the word usage of articles of the same style are usually similar. Therefore, the documents can be separated into partitions consisting of identical rhetoric or topic styles by the literary structures, e.g., introductory remarks, related studies or events, elucidations of methodology or affairs, conclusions of the articles, and references, or footnotes of reporters. In this paper, we explore word position information and then propose two positiondependent language models for speech recognition. The structures and characteristics of these position-dependent language models were extensively investigated, while its performance was analyzed and verified by comparing it with the existing n-gram, mixtureand topic-based language models. The large vocabulary continuous speech recognition (LVCSR) experiments were conducted on the broadcast news transcription task. The preliminary results seem to indicate that the proposed position-dependent models are comparable to the mixtureand topic-based models. Index Terms— Speech recognition, language model, position information, topic information, language model adaptation 1. INTRODUCTION Language model (LM) plays a decisive role in many research fields of natural language processing, such as machine translation, information retrieval, speech recognition, etc. The n-gram language model that follows a statistical modeling paradigm is the most prominently used language model in speech recognition because of its simplicity and predictive power. Nevertheless, the n-gram model, which aims at capturing only the local contextual information, or the lexical regularity of a language, is inevitably faced with the problem of missing the information (either semantic or syntactic information) conveyed in the history before the immediately preceding n-1 words of the newly decoded word. In the recent past, various language modeling approaches have been extensively investigated to extract information among the decoded word and its history to complement the conventional n-gram model. According to different levels of linguistic information being utilized, language models can be roughly classified into the following several categories:

Cite as: Chiu, H.-S., Chen, G.-Y., Lee, C.-J., Chen, B. (2008) Position Information for Language Modeling in Speech Recognition. Proc. International Symposium on Chinese Spoken Language Processing, 101-104

  author={Hsuan-Sheng Chiu and Guan-Yu Chen and Chun-Jen Lee and Berlin Chen},
  title={{Position Information for Language Modeling in Speech Recognition}},
  booktitle={Proc. International Symposium on Chinese Spoken Language Processing},