INTERSPEECH 2008
9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Incorporating Durational Modification in Voice Transformation

Arthur Toth, Alan W. Black

Carnegie Mellon University, USA

Voice transformation is the process of using a small amount of speech data from a target speaker to build a transformation model that can be used to generate arbitrary speech that sounds like the target speaker. One common current technique is building Gausian Mixture Models to map spectral aspects from source to target speakers. This paper proposes the use of duration models to improve the transformation models and output speech quality. Testing across seven target speakers shows a statistically significant improvement in a popular objective metric when duration modification is performed both during training and testing of a Gaussian Mixture Model mapping based voice transformation system.

Full Paper

Bibliographic reference.  Toth, Arthur / Black, Alan W. (2008): "Incorporating durational modification in voice transformation", In INTERSPEECH-2008, 1088-1091.