Eighth ISCA Workshop on Speech Synthesis
Barcelona, Catalonia, Spain
This paper describes the implementation of a unit selection text-to-speech system that incorporates a statistical model Cost (sCost), in addition to target and join costs, for controlling the selection of unit candidates. sCost, a quality control measure, is calculated off-line for each unit by comparing HMM based synthesis and recorded speech with their corresponding unit segment labels. Dynamic time warping (DTW) is used to perform such comparison at level of spectrum, pitch and voice strengths. The method has been tested on unit selection voices created using audio book data. Preliminary results indicate that the use of sCost based only on spectrum introduce more variety on style pronunciation but affects quality; whereas using sCost based on spectrum, pitch and voicing strengths improves significantly the quality, maintaining a more stable narrative style. Index Terms: Text-to-speech synthesis, unit selection synthesis, statistical parametric synthesis, quality control
Bibliographic reference. Pammi, Sathish / Charfuelan, Marcela (2013): "HMM-based scost quality control for unit selection speech synthesis", In SSW8, 53-57.