ISCA Archive SSW 2007
ISCA Archive SSW 2007

Spectral conversion based on statistical models including time-sequence matching

Yoshihiko Nankaku, Kenichi Nakamura, Tomoki Toda, Keiichi Tokuda

This paper proposes a spectral conversion technique based on a new statistical model which includes time-sequence matching. In conventional GMM-based approaches, the Dynamic Programming (DP) matching between source and target feature sequences is performed prior to the training of GMMs. Although a similarity measure of two frames, e.g., the Euclid distance is typically adopted, this might be inappropriate for converting the spectral features. The likelihood function of the proposed model can directly deal with two different length sequences, in which a frame alignment of source and target feature sequences is represented by discrete hidden variables. In the proposed algorithm, the maximum likelihood criterion is consistently applied to the training of model parameters, sequence matching and spectral conversion. In the subjective preference test, the proposed method is superior than the conventional GMM-based method.


Cite as: Nankaku, Y., Nakamura, K., Toda, T., Tokuda, K. (2007) Spectral conversion based on statistical models including time-sequence matching. Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6), 333-338

@inproceedings{nankaku07_ssw,
  author={Yoshihiko Nankaku and Kenichi Nakamura and Tomoki Toda and Keiichi Tokuda},
  title={{Spectral conversion based on statistical models including time-sequence matching}},
  year=2007,
  booktitle={Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6)},
  pages={333--338}
}