Common voice conversion systems employ a spectral / time domain mapping to convert speech from one speaker to another. The speech quality of conversion methods does not sound natural because the spectral / time domain patterns of two speakers' speech do not match completely. In this paper we propose a method that uses inter-frame (dynamic) characteristics in addition to intra-frame characteristics to find the converted speech frames. This method is based on VQ and uses a trellis structure to find the best conversion function. The proposed method provides high quality converted voice, low computational complexity and small trained model size in contrast to other common methods. Subjective and objective evaluations are employed to demonstrate the superiority of the proposed method over the VQ-based and GMM-based methods.
Bibliographic reference. Eslami, Mahdi / Sheikhzadeh, Hamid / Sayadiyan, Abolghasem (2011): "Quality improvement of voice conversion systems based on trellis structured vector quantization", In INTERSPEECH-2011, 665-668.