Eighth ISCA Workshop on Speech Synthesis
Barcelona, Catalonia, Spain
This paper proposes a cross-lingual speaker adaptation (CLSA) method based on factor analysis using bilingual speech data. A state-mapping-based method has recently been proposed for CLSA. However, the method cannot transform only speaker-dependent characteristics. Furthermore, there is no theoretical framework for adapting prosody. To solve these problems, this paper presents a CLSA framework based on factor analysis using bilingual speech data. In this proposed method, model parameters representing language-dependent acoustic features and factors representing speaker characteristics are simultaneously optimized within a unified (maximum likelihood) framework based on a single statistical model by using bilingual speech data. This simultaneous optimization is expected to deliver a better quality of synthesized speech for the desired speaker characteristics. Experimental results show that the proposed method can synthesize better speech than the state-mapping-based method. Index Terms: cross-lingual speaker adaptation, factor analysis, HMM-based speech synthesis
Bibliographic reference. Yoshimura, Takenori / Hashimoto, Kei / Oura, Keiichiro / Nankaku, Yoshihiko / Tokuda, Keiichi (2013): "Cross-lingual speaker adaptation based on factor analysis using bilingual speech data for HMM-based speech synthesis", In SSW8, 297-302.