This paper presents a method for adaptively re-ranking paraphrases in a Spoken Dialogue System (SDS) according to their predicted Text To Speech (TTS) quality. We collect data under 4 different conditions and extract a rich feature set of 55 TTS runtime features. We build predictive models of user ratings using linear regression with latent variables. We then show that these models transfer to a more specific target domain on a separate test set. All our models significantly outperform a random baseline. Our best performing model reaches the same performance as reported by previous work, but it requires 75% less annotated training data. The TTS re-ranking model is part of an end-to-end statistical architecture for Spoken Dialogue Systems developed by the ECFP7 CLASSiC project.
Bibliographic reference. Boidin, Cédric / Rieser, Verena / Plas, Lonneke van der / Lemon, Oliver / Chevelu, Jonathan (2009): "Predicting how it sounds: re-ranking dialogue prompts based on TTS quality for adaptive spoken dialogue systems", In INTERSPEECH-2009, 2487-2490.