JNDSLAM: A SLAM extension for speech synthesis

Rasmus Dall, Xavi Gonzalvo


Pitch movement is a large component of speech prosody, and despite being directly modelled in statistical parametric speech synthesis systems very flat intonation contours are still produced. We present an open-source fully data-driven approach to pitch contour stylisation suitable for speech synthesis based on the SLAM approach. Modifications are proposed based on the Just Noticeable Difference in pitch and tailored to the need of speech synthesis for describing the movement of the pitch. In an anchored Mean Opinion Score (MOS) test using oracle labels the proposed method shows an improvement over standard synthesis. Long Short-Term Memory Neural Networks were then used to predict the contour labels, but initial experiments achieved low prediction rates. We conclude that using current linguistic features for pitch stylisation label mapping is not feasible unless additional features are added. Furthermore an open-source implementation is released.


DOI: 10.21437/SpeechProsody.2016-210

Cite as

Dall, R., Gonzalvo, X. (2016) JNDSLAM: A SLAM extension for speech synthesis. Proc. Speech Prosody 2016, 1024-1028.

Bibtex
@inproceedings{Dall+2016,
author={Rasmus Dall and Xavi Gonzalvo},
title={JNDSLAM: A SLAM extension for speech synthesis},
year=2016,
booktitle={Speech Prosody 2016},
doi={10.21437/SpeechProsody.2016-210},
url={http://dx.doi.org/10.21437/SpeechProsody.2016-210},
pages={1024--1028}
}