7th International Conference on Spoken Language Processing
September 16-20, 2002
In this paper we consider the estimation and mapping of time-varying formant model parameters and orders for voice transformation. The model order is the number of perceptually significant formant trajectories estimated from an analysis of the poles of "over-modelled" linear prediction models of the source and target speech. A 2-D HMM with NF left-to-right states across frequency and M states across time is used to classify formant observations into NF sequential formant clusters. A formant-based non-uniform frequency warping method is proposed for voice transformation. In this method speech spectrum is divided into NF+1 formant bands. A transformation is estimated for each formant band of a phoneme model. Multi-mixture Gaussians are used to model the distribution of parameters in each formant band. The voice mapping yields perceptually high quality results.
Bibliographic reference. Ho, Ching-Hsiang / Rentzos, Dimitrios / Vaseghi, Saeed (2002): "Formant model estimation and transformation for voice morphing", In ICSLP-2002, 2149-2152.