Eighth ISCA Workshop on Speech Synthesis

Barcelona, Catalonia, Spain
August 31-September 2, 2013

HMM-based sCost quality control for unit selection speech synthesis

Sathish Pammi (1), Marcela Charfuelan (2)

(1) ISIR, Universit Pierre et Marie Curie (UPMC), France; (2) DFKI GmbH, Germany

This paper describes the implementation of a unit selection text-to-speech system that incorporates a statistical model Cost (sCost), in addition to target and join costs, for controlling the selection of unit candidates. sCost, a quality control measure, is calculated off-line for each unit by comparing HMM based synthesis and recorded speech with their corresponding unit segment labels. Dynamic time warping (DTW) is used to perform such comparison at level of spectrum, pitch and voice strengths. The method has been tested on unit selection voices created using audio book data. Preliminary results indicate that the use of sCost based only on spectrum introduce more variety on style pronunciation but affects quality; whereas using sCost based on spectrum, pitch and voicing strengths improves significantly the quality, maintaining a more stable narrative style. Index Terms: Text-to-speech synthesis, unit selection synthesis, statistical parametric synthesis, quality control

Full Paper

Bibliographic reference.  Pammi, Sathish / Charfuelan, Marcela (2013): "HMM-based scost quality control for unit selection speech synthesis", In SSW8, 53-57.