Phase Modeling Using Integrated Linear Prediction Residual for Statistical Parametric Speech Synthesis

Nagaraj Adiga, S.R. Mahadeva Prasanna


The conventional statistical parametric speech synthesis (SPSS) focus on characteristics of the magnitude spectrum of speech for speech synthesis by ignoring phase characteristics of speech. In this work, the role of phase information to improve the naturalness of synthetic speech is explored. The phase characteristics of excitation signal are estimated from the integrated linear prediction residual (ILPR) using an all-pass (AP) filter. The coefficients of the AP filter are estimated by minimizing an entropy based objective function from the cosine phase of the analytical signal obtained from ILPR signal. The AP filter coefficients (APCs) derived from the AP filter are used as features for modeling phase in SPSS. During synthesis time, to generate the excitation signal, frame wise generated APCs are used to add the group delay to the impulse excitation. The proposed method is compared with the group delay based phase excitation used in the STRAIGHT method. The experimental results show that proposed phased modeling having a better perceptual synthesis quality when compared with the STRAIGHT method.


 DOI: 10.21437/Interspeech.2017-587

Cite as: Adiga, N., Prasanna, S.M. (2017) Phase Modeling Using Integrated Linear Prediction Residual for Statistical Parametric Speech Synthesis. Proc. Interspeech 2017, 3981-3985, DOI: 10.21437/Interspeech.2017-587.


@inproceedings{Adiga2017,
  author={Nagaraj Adiga and S.R. Mahadeva Prasanna},
  title={Phase Modeling Using Integrated Linear Prediction Residual for Statistical Parametric Speech Synthesis},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={3981--3985},
  doi={10.21437/Interspeech.2017-587},
  url={http://dx.doi.org/10.21437/Interspeech.2017-587}
}