13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Effects of Speaker Adaptive Training on Tensor-based Arbitrary Speaker Conversion

Daisuke Saito (1), Nobuaki Minematsu (2), Keikichi Hirose (1)

(1) Graduate School of Information Science and Technology; (2) Graduate School of Engineering;
The University of Tokyo, Japan

This paper introduces speaker adaptive training techniques to tensor-based arbitrary speaker conversion. In voice conversion studies, realization of conversion from/to an arbitrary speakerfs voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice Gaussian mixture model (EV-GMM) was proposed. Although the EVC can effectively construct the conversion model for arbitrary target speakers using only a few utterances, it does not effectively improve the performance even when using a lot of adaptation data, because of an inherent problem in GMM supervectors. We previously proposed tensor-based speaker space as the solution for this problem, and realized more flexible control of speaker characteristics. In this paper, for larger improvement of the performance of VC, speaker adaptive training and tensorbased speaker representation are integrated. The proposed method can construct the flexible and precise conversion model, and experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.

Index Terms: voice conversion, Gaussian mixture model, eigenvoice, Tucker decomposition, speaker adaptive training

Full Paper

Bibliographic reference.  Saito, Daisuke / Minematsu, Nobuaki / Hirose, Keikichi (2012): "Effects of speaker adaptive training on tensor-based arbitrary speaker conversion", In INTERSPEECH-2012, 98-101.