Sixth European Conference on Speech Communication and Technology
This paper deals with the use of a stochastic language model based on the split of the words history into d words where d is the length of the history. One of our aims is to modelise the semantic and syntactic relationships between words. This model can be considered as a first step for this goal. We experimented our model through the Shannon game (on 10 000 truncated sentences) and implemented it in MAUD, our dictation machine. Tests on MAUD have been done on 300 sentences pronounced by several women and men. This model predicts more words (in the Shannon game) than any other methods we developed before in our team. However, these models are sophisticated in contrast to the one we describe. Moreover, when including unknown words, the results are better than the model ones we presented in a recent work in terms of mean rank, ranks from 1 to 5 and perplexity. This work has needed to use two interpolation methods inspired from Markov model. Also, we discuss the problem of the unknown word modelling.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Langlois, D. / Smadli, K. (1999): "A new based distance language model for a dictation machine: application to MAUD", In EUROSPEECH'99, 1779-1782.