ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

A novel technique for voice conversion based on style and content decomposition with bilinear models

Victor Popa, Jani Nurminen, Moncef Gabbouj

This paper presents a novel technique for voice conversion by solving a two-factor task using bilinear models. The spectral content of the speech represented as line spectral frequencies is separated into so-called style and content parameterizations using a framework proposed in [1]. This formulation of the voice conversion problem in terms of style and content offers a flexible representation of factor interactions and facilitates the use of efficient training algorithms based on singular value decomposition and expectation maximization. Promising results in a comparison with the traditional Gaussian mixture model based method indicate increased robustness with small training sets.


doi: 10.21437/Interspeech.2009-498

Cite as: Popa, V., Nurminen, J., Gabbouj, M. (2009) A novel technique for voice conversion based on style and content decomposition with bilinear models. Proc. Interspeech 2009, 2655-2658, doi: 10.21437/Interspeech.2009-498

@inproceedings{popa09_interspeech,
  author={Victor Popa and Jani Nurminen and Moncef Gabbouj},
  title={{A novel technique for voice conversion based on style and content decomposition with bilinear models}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={2655--2658},
  doi={10.21437/Interspeech.2009-498}
}