Correlation of spectral variations and F0 changes in a vowel is firstly analyzed, where the variations are also compared to VQ distortions calculated in a five-vowel space. It is shown that the F0 change approximately by a half octave produces the spectral variation comparable to the VQ distortion when the codebook size is the number of the vowels. Next, a model to predict the cepstral coefficients' variations caused by the F0 changes is built using the multivariate regression analysis. Experiments show that the generated frame by the model has a remarkably small distance to the target frame. Furthermore, the model is evaluated separately in terms of a spectral envelope predictor with a given F0 and a mapping function of feature sub-spaces. While the models should be built dependently on phonemes and speakers as the former, adequate selection of parameters can enable the speaker/phoneme-independent models to work effectively as the latter.
Cite as: Minematsu, N., Nakagawa, S. (1998) Modeling of variations in cepstral coefficients caused by F0 changes and its application to speech processing. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0052, doi: 10.21437/ICSLP.1998-539
@inproceedings{minematsu98_icslp, author={Nobuaki Minematsu and Seiichi Nakagawa}, title={{Modeling of variations in cepstral coefficients caused by F0 changes and its application to speech processing}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0052}, doi={10.21437/ICSLP.1998-539} }