This paper is on the experiment to design, implement, and optimize a speech-to-speech translation system that is solely based on an appropriate combination of currently available commercial components for speech recognition, machine translation, and speech synthesis. Principal feasibility and performance improvement by domain adaptation have been investigated. We have chosen a distributed architecture to implement an experimental system supporting full-duplex communication. In parallel, it was analysed which kind of application domains are useful and suitable for the respective system infrastructure. For optimization we then investigated how the accuracy of speech recognition can be improved by adaptation to the chosen limited domain (e.g. hotel reservation). This was done by speaker adaptation of the acoustic model, and (more importantly) domain specific adaptation of the language model. Two approaches for LM adaptation were compared: statistical n-grams and context-free grammars. Evaluation by conversation tests shows significant improvements in both approaches. Word accuracy could be raised, e.g. from 75% to 92% using optimised n-grams and to 91% using CFG. Pros and cons with respect to overall system performance and applicability are discussed in detail.
Cite as: Stier, M., Feldes, S. (2005) Domain adaptation of a distributed speech-to-speech translation system. Proc. Applied Spoken Language Interaction in Distributed Environments (ASIDE 2005), paper 10
@inproceedings{stier05_aside, author={Michael Stier and Stefan Feldes}, title={{Domain adaptation of a distributed speech-to-speech translation system}}, year=2005, booktitle={Proc. Applied Spoken Language Interaction in Distributed Environments (ASIDE 2005)}, pages={paper 10} }