Eighth ISCA Workshop on Speech Synthesis
Barcelona, Catalonia, Spain
In statistical voice conversion, distance measure between the converted and target spectral parameters are often used as evalu-ation/training metrics. However, even if same speaker utters the same sentence several times, the spectral parameters of those utterances vary, and therefore, a spectral distance between them still exists. Moreover during real-time conversion procedure, converted speech keeping original prosodic features of input speech is often generated because converting prosodic feature with complex method is essentially difficult. In such a case, an ideal sample of converted speech will be a utterance uttered by a target speaker imitating prosody of the input speech. How-ever a spectral variation caused by such a prosodic change is not considered in the current evaluation/training metrics. In this study, we investigate an intra-speaker spectral variation between utterances of the same sentence focusing on mel-cepstral coeffi-cients as a spectral parameter. Moreover, we propose a method for predicting it from prosodic parameter differences between those utterances and conduct experimental evaluations to show its effectiveness. Index Terms: voice conversion, training/evaluation criterion, intra-speaker spectral variation, prosodic differences, prediction
Bibliographic reference. Inukai, Tatsuo / Toda, Tomoki / Neubig, Graham / Sakti, Sakriani / Nakamura, Satoshi (2013): "Investigation of intra-speaker spectral parameter variation and its prediction towards improvement of spectral conversion metric", In SSW8, 89-94.