16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Combinations of Various Language Model Technologies Including Data Expansion and Adaptation in Spontaneous Speech Recognition

Ryo Masumura (1), Taichi Asami (1), Takanobu Oba (1), Hirokazu Masataki (1), Sumitaka Sakauchi (1), Akinori Ito (2)

(1) NTT Corporation, Japan
(2) Tohoku University, Japan

This paper demonstrates combinations of various language model (LM) technologies simultaneously, not only modeling techniques but also those for training data expansion based on external language resources and unsupervised adaptation for spontaneous speech recognition. Although forming combinations of various LM technologies has been examined, previous works focused on only modeling techniques. In fact, the previous works did not consider other important functionalities in practical spontaneous language modeling; a use of external language resources and an unsupervised LM adaptation. Therefore, our examination employs not only manual transcriptions of target domain speech but also out-of-domain text resources for spontaneous language modeling. In addition, the unsupervised LM adaptation based on multi-pass decoding is aggressively introduced to the combination. Our experimental results show a significant word error rate reduction by combining various technologies compared to using each technology individually in Japanese spontaneous speech recognition task. Furthermore, we also reveal relationships between the technologies.

Full Paper

Bibliographic reference.  Masumura, Ryo / Asami, Taichi / Oba, Takanobu / Masataki, Hirokazu / Sakauchi, Sumitaka / Ito, Akinori (2015): "Combinations of various language model technologies including data expansion and adaptation in spontaneous speech recognition", In INTERSPEECH-2015, 463-467.