Interspeech'2005 - Eurospeech
The goal of voice transformation (VT) is to modify the speech of a source speaker such that it is perceived as if spoken by a target speaker. In this paper, we present a speaker specific line spectral frequency (LSF) quantization based on principle component analysis (PCA) and k-means clustering for VT. An LPC based source-filter model is used to model the speech. Transformation is applied to the spectral characteristics of the speaker, while pitch scaling is applied on the residual signal. PCA has been used to determine the principle components of the source and target LSFs to obtain a more efficient quantization. Only the dimensions with high variance have been quantized and those dimensions have been used to obtain the histogram matrix mapping the two speakers during training. To select the best target codeword sequence corresponding to a source codeword sequence in a sentence, a dynamic programming approach is used. Dynamic programming approach approximates the long-term behavior of LSFs of the target speaker, while it is trying to preserve the relationship between the subsequent frames of the source LSFs. Objective and subjective evaluations have shown that dimension reduction of LSFs before quantization and dynamic programming improves the voice transformation performance.
Bibliographic reference. Salor, Özgül / Demirekler, Mübeccel (2005): "Voice transformation using principle component analysis based LSF quantization and dynamic programming approach", In INTERSPEECH-2005, 1889-1892.