7th International Conference on Spoken Language Processing
September 16-20, 2002
This paper focuses on statistical language modeling for automatic speech recognition. We present a method which aims at finding linguistic units in corpus. This method, called the Selected History Principle, consists in finding strong distant relationships between words. The new units are phrases made up of basic units of our vocabulary linked by these distant relationships. We adapt the multigram principle to large vocabularies in order to introduce an optimal subset of these sequences into a bigram model. The bigram model using these sequences outperforms the baseline bigram model by 21% in terms of Perplexity, and increases the recognition rate of the large vocabulary system Sirocco by 8.7%. The word error rate is decreased by 12.7%.
Bibliographic reference. Langlois, David / Smaïli, Kamel / Haton, Jean-Paul (2002): "Retrieving phrases by selecting the history: application to automatic speech recognition", In ICSLP-2002, 721-724.