This paper proposes a cross-lingual speaker adaptation (CLSA) method based on factor analysis using bilingual speech data. A state-mapping-based method has recently been proposed for CLSA. However, the method cannot transform only speaker-dependent characteristics. Furthermore, there is no theoretical framework for adapting prosody. To solve these problems, this paper presents a CLSA framework based on factor analysis using bilingual speech data. In this proposed method, model parameters representing language-dependent acoustic features and factors representing speaker characteristics are simultaneously optimized within a unified (maximum likelihood) framework based on a single statistical model by using bilingual speech data. This simultaneous optimization is expected to deliver a better quality of synthesized speech for the desired speaker characteristics. Experimental results show that the proposed method can synthesize better speech than the state-mapping-based method.
Index Terms: cross-lingual speaker adaptation, factor analysis, HMM-based speech synthesis
Cite as: Yoshimura, T., Hashimoto, K., Oura, K., Nankaku, Y., Tokuda, K. (2013) Cross-lingual speaker adaptation based on factor analysis using bilingual speech data for HMM-based speech synthesis. Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8), 297-302
@inproceedings{yoshimura13_ssw, author={Takenori Yoshimura and Kei Hashimoto and Keiichiro Oura and Yoshihiko Nankaku and Keiichi Tokuda}, title={{Cross-lingual speaker adaptation based on factor analysis using bilingual speech data for HMM-based speech synthesis}}, year=2013, booktitle={Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8)}, pages={297--302} }