Eighth ISCA Workshop on Speech Synthesis
Barcelona, Catalonia, Spain
In HTS, a HMM-based speech synthesis system, about fifty contextual factors are
introduced to label a segment to synthesize English utterances. Published studies
indicate that most of them are used for clustering the prosodic component of
speech. Nevertheless, the influence of all these factors on modeling is still unclear
The work presented in this paper deals with the analysis of contextual factors on acoustic parameters modeling in the context of a French synthesis purpose. Two objective and one subjective methodologies of evaluation are carried out to conduct this study. The first one relies on a GMM-approach to achieve a global evaluation of the synthetic acoustic space. The second one is based on a pairwise distance determined according to the acoustic parameter evaluated. Finally, a subjective evaluation is conducted to complete this study.
Experimental results show that using phonetic context improves the overall spectrum and duration modeling and using syllable informations improves the F0 modeling. However other contextual factors do not significantly improve the quality of the HTS models. Index Terms: HTS, Evaluation, Contextual factors, French synthesis
Bibliographic reference. Maguer, Sébastien Le / Barbot, Nelly / Boeffard, Olivier (2013): "Evaluation of contextual descriptors for HMM-based speech synthesis in French", In SSW8, 153-158.