This paper introduces a new approach that directly uses latent words language models (LWLMs) in automatic speech recognition (ASR). LWLMs are effective against data sparseness because of their soft-decision clustering structure and Bayesian modeling so it can be expected that LWLMs perform robustly in multiple ASR tasks. Unfortunately, implementing a LWLM to ASR is difficult because of its computation complexity. In our previous work, we implemented an approximate LWLM for ASR by sampling words according to a stochastic process and training a word n-gram LMs. However, the previous approach cannot take into account the latent variable sequence behind the recognition hypothesis. To solve this problem, we propose a method based on Viterbi decoding that simultaneously decodes the recognition hypothesis and its latent variable sequence. In the proposed method, we use Gibbs sampling for rapid decoding. Our experiments show the effectiveness of the proposed Viterbi decoding based on n-best rescoring. Moreover, we also investigate the effects on the combination of the previous approximate LWLM and the proposed Viterbi decoding.
Bibliographic reference. Masumura, Ryo / Masataki, Hirokazu / Oba, Takanobu / Yoshioka, Osamu / Takahashi, Satoshi (2013): "Viterbi decoding for latent words language models using gibbs sampling", In INTERSPEECH-2013, 3429-3433.