This paper presents a novel technique for voice conversion by solving a two-factor task using bilinear models. The spectral content of the speech represented as line spectral frequencies is separated into so-called style and content parameterizations using a framework proposed in . This formulation of the voice conversion problem in terms of style and content offers a flexible representation of factor interactions and facilitates the use of efficient training algorithms based on singular value decomposition and expectation maximization. Promising results in a comparison with the traditional Gaussian mixture model based method indicate increased robustness with small training sets.
Bibliographic reference. Popa, Victor / Nurminen, Jani / Gabbouj, Moncef (2009): "A novel technique for voice conversion based on style and content decomposition with bilinear models", In INTERSPEECH-2009, 2655-2658.