ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Parameterization of vocal fry in HMM-based speech synthesis

Hanna Silén, Elina Helander, Jani Nurminen, Moncef Gabbouj

HMM-based speech synthesis offers a way to generate speech with different voice qualities. However, sometimes databases contain certain inherent voice qualities that need to be parametrized properly. One example of this is vocal fry typically occurring at the end of utterances. A popular mixed excitation vocoder for HMMbased speech synthesis is STRAIGHT. The standard STRAIGHT is optimized for modal voices and may not produce high quality with other voice types. Fortunately, due to the flexibility of STRAIGHT, different F0 and aperiodicity measures can be used in the synthesis without any inherent degradations in speech quality. We have replaced the STRAIGHT excitation with a representation based on a robust F0 measure and a carefully determined two-band voicing. According to our analysis-synthesis experiments, the new parameterization can improve the speech quality. In HMM-based speech synthesis, the quality is significantly improved especially due to the better modeling of vocal fry.

doi: 10.21437/Interspeech.2009-147

Cite as: Silén, H., Helander, E., Nurminen, J., Gabbouj, M. (2009) Parameterization of vocal fry in HMM-based speech synthesis. Proc. Interspeech 2009, 1775-1778, doi: 10.21437/Interspeech.2009-147

  author={Hanna Silén and Elina Helander and Jani Nurminen and Moncef Gabbouj},
  title={{Parameterization of vocal fry in HMM-based speech synthesis}},
  booktitle={Proc. Interspeech 2009},