10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

State Mapping Based Method for Cross-Lingual Speaker Adaptation in HMM-Based Speech Synthesis

Yi-Jian Wu, Yoshihiko Nankaku, Keiichi Tokuda

Nagoya Institute of Technology, Japan

A phone mapping-based method had been introduced for crosslingual speaker adaptation in HMM-based speech synthesis. In this paper, we continue to propose a state mapping based method for cross-lingual speaker adaptation. In this method, we firstly establish the state mapping between two voice models in source and target languages using Kullback-Leibler divergence (KLD). Based on the established mapping information, we introduce two approaches to conduct cross-lingual speaker adaptation, including data mapping and transform mapping approaches. From the experimental results, the state mapping based method outperformed the phone mapping based method. In addition, the data mapping approach achieved better speaker similarity, and the transform mapping approach achieved better speech quality after adaptation.

Full Paper

Bibliographic reference.  Wu, Yi-Jian / Nankaku, Yoshihiko / Tokuda, Keiichi (2009): "State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis", In INTERSPEECH-2009, 528-531.