Spoken Language Understanding performs automatic concept labeling and segmentation of speech utterances. For this task, many approaches have been proposed based on both generative and discriminative models. While all these methods have shown remarkable accuracy on manual transcription of spoken utterances, robustness to noisy automatic transcription is still an open issue. In this paper we study algorithms for Spoken Language Understanding combining complementary learning models: Stochastic Finite State Transducers produce a list of hypotheses, which are re-ranked using a discriminative algorithm based on kernel methods. Our experiments on two different spoken dialog corpora, MEDIA and LUNA, show that the combined generative-discriminative model reaches the state-of-the-art such as Conditional Random Fields (CRF) on manual transcriptions, and it is robust to noisy automatic transcriptions, outperforming, in some cases, the state-of-the-art.
Bibliographic reference. Dinarelli, Marco / Moschitti, Alessandro / Riccardi, Giuseppe (2009): "Concept segmentation and labeling for conversational speech", In INTERSPEECH-2009, 2747-2750.