A parametric conversion of speech individuality is proposed based on STRAIGHT speech representation. STRAIGHT speech analysis-synthesis can produce high quality speech for various kinds of transformations by using 1) pitch synchronous windowing, 2) time-frequency spectrum interpolating and 3) randomized all-pass filtering for shaping phase spectrum. In order to utilize the smoothness of STRAIGHT spectrum, speech conversion is accomplished by warping the frequency axis. The warping functions are trained for each class of the predetermined spectrum shape grouping. The evaluation test is performed to compare the proposed method and VQ prototype mapping or linear transformation of cepstrum vectors. As a measure of converted speech quality, the MOS score of 6 subjects is calculated and is found to be better than conventional methods by about 1.5 point without degrading the accuracy of speech individuality discrimination.
Cite as: Maeda, N., Hideki, B., Kajita, S., Takeda, K., Itakura, F. (1999) Speaker conversion through non-linear frequency warping of straight spectrum. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 827-830, doi: 10.21437/Eurospeech.1999-201
@inproceedings{maeda99_eurospeech, author={Noriyasu Maeda and Banno Hideki and Shoji Kajita and Kazuya Takeda and Fumitada Itakura}, title={{Speaker conversion through non-linear frequency warping of straight spectrum}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={827--830}, doi={10.21437/Eurospeech.1999-201} }