15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Application of Matrix Variate Gaussian Mixture Model to Statistical Voice Conversion

Daisuke Saito, Hidenobu Doi, Nobuaki Minematsu, Keikichi Hirose

University of Tokyo, Japan

This paper describes a novel approach to construct a mapping function between a given speaker pair using probability density functions (PDF) of matrix variate. In voice conversion studies, two important functions should be realized: 1) precise modeling of both the source and target feature spaces, and 2) construction of a proper transform function between these spaces. Voice conversion based on Gaussian mixture model (GMM) is the de facto standard because of their flexibility and easiness in handling. In GMM-based approaches, a joint vector space of the source and target is first constructed, and the joint PDF of the two vectors is modeled as GMM in the joint vector space. The joint vector approach mainly focuses on precise modeling of the `joint' feature space, and does not always construct a proper transform between two feature spaces. In contrast, the proposed method constructs the joint PDF as GMM in a matrix variate space whose row and column respectively correspond to the two functions, and it has potential to precisely model both the characteristics of the feature spaces and the relation between the source and target spaces.

Full Paper

Bibliographic reference.  Saito, Daisuke / Doi, Hidenobu / Minematsu, Nobuaki / Hirose, Keikichi (2014): "Application of matrix variate Gaussian mixture model to statistical voice conversion", In INTERSPEECH-2014, 2504-2508.