Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Voice Adaptation Using Multi-Functional Transformation with Weighting by Radial Basis Function Networks

Naoto Iwahashi (1), Yoshinori Sagisaka (2)

(1) SONY Research Center, Tokyo, Japan
(2) ATR Interpreting Telecommunications, Research Labs., Kyoto, Japan

This paper describes a spectral transformation method for voice conversion, using multiple linear functions with weighting by Radial Basis Function (RBF) networks. Spectral transformation by speaker interpolation with single linear function is the method which can obtain a moderate mapping by using a small amount of training data. However, even if larger amounts of data could be used, a more precise mapping can not be obtained. To cope with this, multiple linear functions with weighting are used. The weight value is decided by a weighting function represented by RBF networks. Parameters of both the linear functions and the weighting function are simultaneously adapted. The reduction rate of the spectral distance from the generated spectrum to the target speaker, compared with the distance from the interpolated speaker closest to the target, was calculated. It was shown that while the distance reduction rate was about 42 % using the single linear function, the rate increased to 48 % using the multi-functional transformation, which includes two linear functions.

Full Paper

Bibliographic reference.  Iwahashi, Naoto / Sagisaka, Yoshinori (1994): "Voice adaptation using multi-functional transformation with weighting by radial basis function networks", In ICSLP-1994, 1599-1602.