9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Improving the Multigram Algorithm by Using Lattices as Input

Joris Driesen, Hugo Van hamme

Katholieke Universiteit Leuven, Belgium

The multigram algorithm is a statistical technique that can be used for extracting recurring patterns from a sequential input. When provided with a symbol sequence representing a speech signal, it is able to extract word-like patterns from it, despite the large amount of subsequences that can represent a single word. For this, it uses statistical information derived from the entire input. However, due to the abstraction of speech to symbols, much of the information originally present in the signal is no longer available to the algorithm.

In this paper we propose a way of using a richer abstraction of the signal in the form of a lattice. Furthermore, a way of grounding recurring patterns to concepts in other modalities will be presented. Finally, the information learned by the algorithm using both kinds of input is tested in a recognition experiment. This will show that the use of lattices leads to a significant improvement in terms of recognition rate.

