Within the context of a deployed spoken dialog service, this study presents a new interpretation strategy based on the sequential use of different ASR output representations: 1-best strings, word lattices and confusion networks. The goal is to reject as early as possible in the decoding process the non-relevant messages containing non-speech or out-of-domain content. This is done through the 1-pass of the ASR decoding process thanks to specific acoustic and language models. A confusion network (CN) is then calculated for the remaining messages and another rejection process is applied with the confidence measures obtained in the CN. The messages kept at this stage are considered relevant; therefore the search for the best interpretation is applied to a richer search space than just the 1-best word string: either the whole CN or the whole word lattice. An improved, SLU oriented, CN generation algorithm is also proposed that significantly reduces the size of the CN obtained while improving the recognition performance. This strategy is evaluated on a large corpus of real users' messages obtained from a deployed service.
Bibliographic reference. Minescu, Bogdan / Damnati, Géraldine / Béchet, Frédéric / Mori, Renato De (2007): "Conditional use of word lattices, confusion networks and 1-best string hypotheses in a sequential interpretation strategy", In INTERSPEECH-2007, 1617-1620.