10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

A Novel Technique for Voice Conversion Based on Style and Content Decomposition with Bilinear Models

Victor Popa (1), Jani Nurminen (2), Moncef Gabbouj (1)

(1) Tampere University of Technology, Finland
(2) Nokia Devices R&D, Finland

This paper presents a novel technique for voice conversion by solving a two-factor task using bilinear models. The spectral content of the speech represented as line spectral frequencies is separated into so-called style and content parameterizations using a framework proposed in [1]. This formulation of the voice conversion problem in terms of style and content offers a flexible representation of factor interactions and facilitates the use of efficient training algorithms based on singular value decomposition and expectation maximization. Promising results in a comparison with the traditional Gaussian mixture model based method indicate increased robustness with small training sets.

Full Paper

Bibliographic reference.  Popa, Victor / Nurminen, Jani / Gabbouj, Moncef (2009): "A novel technique for voice conversion based on style and content decomposition with bilinear models", In INTERSPEECH-2009, 2655-2658.