This paper presents an implementation of real-time processing of statistical voice conversion (VC) based on Gaussian mixture models (GMMs). To develop VC applications for enhancing our human-to-human speech communication, it is essential to implement real-time conversion processing. Moreover, it is useful to further reduce computational complexity of the conversion processing for making VC applications available in limited resources. In this paper, we propose an implementation method of real-time VC based on low-delay conversion processing considering dynamic features and a global variance. Moreover, we also propose computationally efficient VC processing based on fast source feature extraction and diagonalization of full covariance matrices. Some experimental results are presented to show that the proposed methods works reasonably well.
Index Terms: voice conversion, real-time processing, lowdelay conversion, computational efficiency
Bibliographic reference. Toda, Tomoki / Muramatsu, Takashi / Banno, Hideki (2012): "Implementation of computationally efficient real-time voice conversion", In INTERSPEECH-2012, 94-97.