Incorporating durational modification in voice transformation

Arthur Toth, Alan W. Black

Voice transformation is the process of using a small amount of speech data from a target speaker to build a transformation model that can be used to generate arbitrary speech that sounds like the target speaker. One common current technique is building Gausian Mixture Models to map spectral aspects from source to target speakers. This paper proposes the use of duration models to improve the transformation models and output speech quality. Testing across seven target speakers shows a statistically significant improvement in a popular objective metric when duration modification is performed both during training and testing of a Gaussian Mixture Model mapping based voice transformation system.

doi: 10.21437/Interspeech.2008-335

Cite as: Toth, A., Black, A.W. (2008) Incorporating durational modification in voice transformation. Proc. Interspeech 2008, 1088-1091, doi: 10.21437/Interspeech.2008-335

