16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Discriminative Bilinear Language Modeling for Broadcast Transcriptions

Akio Kobayashi, Manon Ichiki, Takahiro Oku, Kazuo Onoe, Shoei Sato

NHK, Japan

A discriminative bilinear language model (DBLM) estimated on the basis of Bayes risk minimization is described. The discriminative language model (DLM) is conventionally trained by using n-gram features. However, given a large amount of training data, the DLM is not necessarily trained efficiently because of the increasing number of unique features. In addition, though some of the n-grams share the same word sequences as contexts, the DLM never reflects this kind of information in that they are not designed to work in a coordinated manner. These disadvantages of utilizing n-gram features could lead to a loss of DLM robustness. We solve these issues by introducing a bilinear network structure to the features aimed at factorizing the contexts shared among the n-grams and estimating the model more robustly. In our proposed language modeling, all the model parameters, such as weight matrices, are estimated according to the objective based on the Bayes risk to be minimized on the training lattices. The experimental results show that our DBLM trained in the lightly-supervised manner significantly reduced the word error rate compared with that of the trigram LM, while the conventional DLM does not yield a significant reduction.

Full Paper

Bibliographic reference.  Kobayashi, Akio / Ichiki, Manon / Oku, Takahiro / Onoe, Kazuo / Sato, Shoei (2015): "Discriminative bilinear language modeling for broadcast transcriptions", In INTERSPEECH-2015, 453-457.