Sixth European Conference on Speech Communication and Technology
(EUROSPEECH'99)

Budapest, Hungary
September 5-9, 1999

Speaker Conversion Through Non-Linear Frequency Warping of Straight Spectrum

Noriyasu Maeda (1), Banno Hideki (1,2), Shoji Kajita (2,3), Kazuya Takeda (1,2), Fumitada Itakura (2,3)

(1) Graduate School of Engineering, (2) Center for Information Media Science, (3) Center for Integrated Acoustic Information Research, Nagoya University, Japan

A parametric conversion of speech individuality is proposed based on STRAIGHT speech representation. STRAIGHT speech analysis-synthesis can produce high quality speech for various kinds of transformations by using 1) pitch synchronous windowing, 2) time-frequency spectrum interpolating and 3) randomized all-pass filtering for shaping phase spectrum. In order to utilize the smoothness of STRAIGHT spectrum, speech conversion is accomplished by warping the frequency axis. The warping functions are trained for each class of the predetermined spectrum shape grouping. The evaluation test is performed to compare the proposed method and VQ prototype mapping or linear transformation of cepstrum vectors. As a measure of converted speech quality, the MOS score of 6 subjects is calculated and is found to be better than conventional methods by about 1.5 point without degrading the accuracy of speech individuality discrimination.


Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Maeda, Noriyasu / Hideki, Banno / Kajita, Shoji / Takeda, Kazuya / Itakura, Fumitada (1999): "Speaker conversion through non-linear frequency warping of straight spectrum", In EUROSPEECH'99, 827-830.