Objective Evaluation Using Association Between Dimensions Within Spectral Features for Statistical Parametric Speech Synthesis

Yusuke Ijima, Taichi Asami, Hideyuki Mizuno


This paper presents a novel objective evaluation technique for statistical parametric speech synthesis. One of its novel features is that it focuses on the association between dimensions within the spectral features. We first use a maximal information coefficient to analyze the relationship between subjective scores and associations of spectral features obtained from natural and various types of synthesized speech. The analysis results indicate that the scores improve as the association becomes weaker. We then describe the proposed objective evaluation technique, which uses a voice conversion method to detect the associations within spectral features. We perform subjective and objective experiments to investigate the relationship between subjective scores and objective scores. The proposed objective scores are compared to the mel-cepstral distortion. The results indicate that our objective scores achieve dramatically higher correlation to subjective scores than the mel-cepstral distortion.


DOI: 10.21437/Interspeech.2016-584

Cite as

Ijima, Y., Asami, T., Mizuno, H. (2016) Objective Evaluation Using Association Between Dimensions Within Spectral Features for Statistical Parametric Speech Synthesis. Proc. Interspeech 2016, 337-341.

Bibtex
@inproceedings{Ijima+2016,
author={Yusuke Ijima and Taichi Asami and Hideyuki Mizuno},
title={Objective Evaluation Using Association Between Dimensions Within Spectral Features for Statistical Parametric Speech Synthesis},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-584},
url={http://dx.doi.org/10.21437/Interspeech.2016-584},
pages={337--341}
}