ISCA Archive SSW 2013
ISCA Archive SSW 2013

Cross-lingual speaker adaptation based on factor analysis using bilingual speech data for HMM-based speech synthesis

Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

This paper proposes a cross-lingual speaker adaptation (CLSA) method based on factor analysis using bilingual speech data. A state-mapping-based method has recently been proposed for CLSA. However, the method cannot transform only speaker-dependent characteristics. Furthermore, there is no theoretical framework for adapting prosody. To solve these problems, this paper presents a CLSA framework based on factor analysis using bilingual speech data. In this proposed method, model parameters representing language-dependent acoustic features and factors representing speaker characteristics are simultaneously optimized within a unified (maximum likelihood) framework based on a single statistical model by using bilingual speech data. This simultaneous optimization is expected to deliver a better quality of synthesized speech for the desired speaker characteristics. Experimental results show that the proposed method can synthesize better speech than the state-mapping-based method.

Index Terms: cross-lingual speaker adaptation, factor analysis, HMM-based speech synthesis


Cite as: Yoshimura, T., Hashimoto, K., Oura, K., Nankaku, Y., Tokuda, K. (2013) Cross-lingual speaker adaptation based on factor analysis using bilingual speech data for HMM-based speech synthesis. Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8), 297-302

@inproceedings{yoshimura13_ssw,
  author={Takenori Yoshimura and Kei Hashimoto and Keiichiro Oura and Yoshihiko Nankaku and Keiichi Tokuda},
  title={{Cross-lingual speaker adaptation based on factor analysis using bilingual speech data for HMM-based speech synthesis}},
  year=2013,
  booktitle={Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8)},
  pages={297--302}
}