11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Cross-Lingual Spoken Language Understanding from Unaligned Data Using Discriminative Classification Models and Machine Translation

Fabrice Lefèvre (1), François Mairesse (2), Steve Young (2)

(1) LIA, France
(2) University of Cambridge, UK

This paper investigates several approaches to bootstrapping a new spoken language understanding (SLU) component in a target language given a large dataset of semantically-annotated utterances in some other source language. The aim is to reduce the cost associated with porting a spoken dialogue system from one language to another by minimising the amount of data required in the target language. Since word-level semantic annotations are costly, Semantic Tuple Classifiers (STCs) are used in conjunction with statistical machine translation models both of which are trained from unaligned data to further reduce development time. The paper presents experiments in which a French SLU component in the tourist information domain is bootstrapped from English data. Results show that training STCs on automatically translated data produced the best performance for predicting the utterance's dialogue act type, however individual slot/value pairs are best predicted by training STCs on the source language and using them to decode translated utterances.

Full Paper

Bibliographic reference.  Lefèvre, Fabrice / Mairesse, François / Young, Steve (2010): "Cross-lingual spoken language understanding from unaligned data using discriminative classification models and machine translation", In INTERSPEECH-2010, 78-81.