ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Predicting how it sounds: re-ranking dialogue prompts based on TTS quality for adaptive spoken dialogue systems

Cédric Boidin, Verena Rieser, Lonneke van der Plas, Oliver Lemon, Jonathan Chevelu

This paper presents a method for adaptively re-ranking paraphrases in a Spoken Dialogue System (SDS) according to their predicted Text To Speech (TTS) quality. We collect data under 4 different conditions and extract a rich feature set of 55 TTS runtime features. We build predictive models of user ratings using linear regression with latent variables. We then show that these models transfer to a more specific target domain on a separate test set. All our models significantly outperform a random baseline. Our best performing model reaches the same performance as reported by previous work, but it requires 75% less annotated training data. The TTS re-ranking model is part of an end-to-end statistical architecture for Spoken Dialogue Systems developed by the ECFP7 CLASSiC project.


doi: 10.21437/Interspeech.2009-662

Cite as: Boidin, C., Rieser, V., Plas, L.v.d., Lemon, O., Chevelu, J. (2009) Predicting how it sounds: re-ranking dialogue prompts based on TTS quality for adaptive spoken dialogue systems. Proc. Interspeech 2009, 2487-2490, doi: 10.21437/Interspeech.2009-662

@inproceedings{boidin09b_interspeech,
  author={Cédric Boidin and Verena Rieser and Lonneke van der Plas and Oliver Lemon and Jonathan Chevelu},
  title={{Predicting how it sounds: re-ranking dialogue prompts based on TTS quality for adaptive spoken dialogue systems}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={2487--2490},
  doi={10.21437/Interspeech.2009-662}
}