In contrast to conventional n-gram approches, which are the most used language model in continuous speech recognition system, the multigram approach models a stream of variable-length sequences. To overcome the independence assumption in classical multigram, we propose in this paper a hierarchical model which successively relaxes this assumption. We called this model: Mnv. The estimation of the model parameters can be formulated as a Maximum Likelihood estimation problem from incomplete data used at different levels (j in 1...v). We show that estimates of the model parameters can be computed through an iterative Expectation-Maximization algorithm. A few experimental tests were carried out on a corpus extracted from the French ``Le Monde''. Results show that Mnv outperforms based multigram and interpolated bigram but are comparable to the interpolated trigram model.
Cite as: Zitouni, I. (1998) A language modeling based on a hierarchical approach: m_n^v. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0727, doi: 10.21437/ICSLP.1998-841
@inproceedings{zitouni98b_icslp, author={Imed Zitouni}, title={{A language modeling based on a hierarchical approach: m_n^v}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0727}, doi={10.21437/ICSLP.1998-841} }