10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Predicting How it Sounds: Re-Ranking Dialogue Prompts Based on TTS Quality for Adaptive Spoken Dialogue Systems

Cédric Boidin (1), Verena Rieser (2), Lonneke van der Plas (3), Oliver Lemon (2), Jonathan Chevelu (1)

(1) Orange Labs, France
(2) University of Edinburgh, UK
(3) Université de Genève, Switzerland

This paper presents a method for adaptively re-ranking paraphrases in a Spoken Dialogue System (SDS) according to their predicted Text To Speech (TTS) quality. We collect data under 4 different conditions and extract a rich feature set of 55 TTS runtime features. We build predictive models of user ratings using linear regression with latent variables. We then show that these models transfer to a more specific target domain on a separate test set. All our models significantly outperform a random baseline. Our best performing model reaches the same performance as reported by previous work, but it requires 75% less annotated training data. The TTS re-ranking model is part of an end-to-end statistical architecture for Spoken Dialogue Systems developed by the ECFP7 CLASSiC project.

Full Paper

Bibliographic reference.  Boidin, Cédric / Rieser, Verena / Plas, Lonneke van der / Lemon, Oliver / Chevelu, Jonathan (2009): "Predicting how it sounds: re-ranking dialogue prompts based on TTS quality for adaptive spoken dialogue systems", In INTERSPEECH-2009, 2487-2490.