Eighth ISCA Workshop on Speech Synthesis
Barcelona, Catalonia, Spain
This paper presents a voice conversion (VC) technique for noisy environments based on a sparse representation of speech. In our previous work, we discussed an exemplar-based VC technique for noisy environments. In that report, source exemplars and target exemplars are extracted from the parallel training data, having the same texts uttered by the source and target speakers. The input source signal is represented using the source exemplars and their weights. Then, the converted speech is constructed from the target exemplars and the weights related to the source exemplars. However, this exemplar-based approach needs to hold all training exemplars (frames) and it requires high computation times to obtain the weights of the source exemplars. In this paper, we propose a framework to train the basis matrices of source and target exemplars so that they have a common weight matrix. By using the basis matrices instead of the exemplars, the VC is performed with lower computation times than with the exemplar-based method. The effectiveness of this method was confirmed by comparing its effectiveness, in speaker conversion experiments using noise-added speech data, with the effectiveness of an exemplar-based method and a conventional Gaussian mixture model (GMM)-based method. Index Terms: voice conversion, sparse representation, nonnegative matrix factorization, noise robustness
Bibliographic reference. Takashima, Ryoichi / Aihara, Ryo / Takiguchi, Tetsuya / Ariki, Yasuo (2013): "Noise-robust voice conversion based on spectral mapping on sparse space", In SSW8, 71-75.