ISCA Archive SSW 2013
ISCA Archive SSW 2013

Investigation of intra-speaker spectral parameter variation and its prediction towards improvement of spectral conversion metric

Tatsuo Inukai, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura

In statistical voice conversion, distance measure between the converted and target spectral parameters are often used as evalu-ation/training metrics. However, even if same speaker utters the same sentence several times, the spectral parameters of those utterances vary, and therefore, a spectral distance between them still exists. Moreover during real-time conversion procedure, converted speech keeping original prosodic features of input speech is often generated because converting prosodic feature with complex method is essentially difficult. In such a case, an ideal sample of converted speech will be a utterance uttered by a target speaker imitating prosody of the input speech. How-ever a spectral variation caused by such a prosodic change is not considered in the current evaluation/training metrics. In this study, we investigate an intra-speaker spectral variation between utterances of the same sentence focusing on mel-cepstral coeffi-cients as a spectral parameter. Moreover, we propose a method for predicting it from prosodic parameter differences between those utterances and conduct experimental evaluations to show its effectiveness.

Index Terms: voice conversion, training/evaluation criterion, intra-speaker spectral variation, prosodic differences, prediction


Cite as: Inukai, T., Toda, T., Neubig, G., Sakti, S., Nakamura, S. (2013) Investigation of intra-speaker spectral parameter variation and its prediction towards improvement of spectral conversion metric. Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8), 89-94

@inproceedings{inukai13_ssw,
  author={Tatsuo Inukai and Tomoki Toda and Graham Neubig and Sakriani Sakti and Satoshi Nakamura},
  title={{Investigation of intra-speaker spectral parameter variation and its prediction towards improvement of spectral conversion metric}},
  year=2013,
  booktitle={Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8)},
  pages={89--94}
}