Speech understanding through concept classification offers a possible way of machine translation in speech-to-speech translation systems and can be used in conjunction with conventional statistical machine translation. While correct concept classification offers the promise of obtaining well-formed target language speech output, the approach does not scale well to large number of concepts. Importantly, it is also critical to know when to accept or reject the classifier. We formulate the speech classification as a MAP estimation problem to derive the understanding model and improve its performance by incorporating dialog context information. Specifically, for a two-way speech translation system, a classification scheme is derived here that utilizes context information from both sides of the conversation through an n-gram dialog model. The method was evaluated using data from an English-Farsi trans-lingual doctor-patient dialog system and its classification and rejection accuracies were compared to those of a baseline system with an understanding model only. The benefit of incorporating context with the proposed dialog model provided a modest improvement in classification accuracy (about 5% relative error reduction) and a significant improvement in the rejection accuracy (up to 31.4% relative reduction in error).
Cite as: Ettelaie, E., Georgiou, P.G., Narayanan, S. (2006) Cross-lingual dialog model for speech to speech translation. Proc. Interspeech 2006, paper 1858-Tue2CaP.7, doi: 10.21437/Interspeech.2006-356
@inproceedings{ettelaie06_interspeech, author={Emil Ettelaie and Panayiotis G. Georgiou and Shrikanth Narayanan}, title={{Cross-lingual dialog model for speech to speech translation}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1858-Tue2CaP.7}, doi={10.21437/Interspeech.2006-356} }