The goal of voice conversion for an interpreting telephone is to preserve the individuality of a speaker's speech when that speaker's utterances are translated and used to synthesize speech in another language. We call the problem "cross-language voice conversion". We address three issues in this paper. The first is an algorithm for voice conversion. Our approach considers voice conversion as a mapping problem between two speakers' spectra. The characteristics of speaker individuality is converted by mapping codebooks. Secondly we show that cross-language voice conversion is possible using data from a bilingual speaker. The conclusion reached from an experiment is that the difference in listening quality between speech coded with a codebook obtained from the same language, or with a codebook obtained from the other language is very small. Finally, to generate a mapping codebook for cross-language voice conversion we proposed making use of a bilingual speaker's speech as a bridge, and evaluated the performance by spectrum distortion. The converted speech from English male to Japanese female is as understandable as the unconverted English speech and, moreover, it is recognized as female speech.
Cite as: Abe, M., Shikano, K., Kuwabara, H. (1990) Voice conversion for an interpreting telephone. Proc. ESCA Workshop on Speaker Characterization in Speech Technology, 40-45
@inproceedings{abe90_scst, author={Masanobu Abe and Kiyohiro Shikano and Hisao Kuwabara}, title={{Voice conversion for an interpreting telephone}}, year=1990, booktitle={Proc. ESCA Workshop on Speaker Characterization in Speech Technology}, pages={40--45} }