ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Effective use of pause information in language modelling for speech recognition

Kengo Ohta, Masatoshi Tsuchiya, Seiichi Nakagawa

This paper addresses mismatch between speech processing units used by a speech recognizer and sentences of corpora. A standard speech recognizer divides an input speech into speech processing units based on its power information. On the other hand, training corpora of language models are divided into sentences based on punctuations. There is inevitable mismatch between speech processing units and sentences, and both of them are not optimal for a spontaneous speech recognition task. This paper presents two sub issues to address this problem. At first, the words of the preceding units are utilized to predict the words of the succeeding units, in order to address the mismatch between speech processing units and optimal units. Secondly, we propose a method to build a language model including short pause from a corpus with no short pause to address the mismatch between speech processing units and sentences. Their combination achieved a 4.5% relative improvement over the conventional method in the meeting speech recognition task.


doi: 10.21437/Interspeech.2009-126

Cite as: Ohta, K., Tsuchiya, M., Nakagawa, S. (2009) Effective use of pause information in language modelling for speech recognition. Proc. Interspeech 2009, 2691-2694, doi: 10.21437/Interspeech.2009-126

@inproceedings{ohta09_interspeech,
  author={Kengo Ohta and Masatoshi Tsuchiya and Seiichi Nakagawa},
  title={{Effective use of pause information in language modelling for speech recognition}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={2691--2694},
  doi={10.21437/Interspeech.2009-126}
}