16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Combining Multiple-Type Input Units Using Recurrent Neural Network for LVCSR Language Modeling

Vataya Chunwijitra, Ananlada Chotimongkol, Chai Wutiwiwatchai

NECTEC, Thailand

In this paper, we investigate the use of a Recurrent Neural Network (RNN) in combining hybrid input types, namely word and pseudo-morpheme (PM) for Thai LVCSR language modeling. Similar to other neural network frameworks, there is no restriction on RNN input types. To exploit this advantage, the input vector of a proposed hybrid RNN language model (RNNLM) is a concatenated vector of word and PM vectors. After the first-pass decoding with an n-gram LM, a word-based lattice is expanded to include the corresponding PMs of each word. The hybrid RNNLM is then used to re-score the hybrid lattice in the second-pass decoding. We tested our hybrid RNNLM on two recognition tasks: broadcast news transcription and mobile speech-to-speech translation. The proposed model achieved better recognition performance than a baseline word-based RNNLM as hybrid input types provide more flexible unit choices for language model re-scoring. The computational complexity of a full-hybrid RNNLM can be reduced by limiting the input vector to include only frequent words and PMs. In a reduced-hybrid RNNLM, the size of the input vector can be reduced by half which can considerably save both training and decoding time without affecting recognition accuracy.

Full Paper

Bibliographic reference.  Chunwijitra, Vataya / Chotimongkol, Ananlada / Wutiwiwatchai, Chai (2015): "Combining multiple-type input units using recurrent neural network for LVCSR language modeling", In INTERSPEECH-2015, 2385-2389.