INTERSPEECH 2009
10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Parameterization of Vocal Fry in HMM-Based Speech Synthesis

Hanna Silén (1), Elina Helander (1), Jani Nurminen (2), Moncef Gabbouj (1)

(1) Tampere University of Technology, Finland
(2) Nokia Devices R&D, Finland

HMM-based speech synthesis offers a way to generate speech with different voice qualities. However, sometimes databases contain certain inherent voice qualities that need to be parametrized properly. One example of this is vocal fry typically occurring at the end of utterances. A popular mixed excitation vocoder for HMMbased speech synthesis is STRAIGHT. The standard STRAIGHT is optimized for modal voices and may not produce high quality with other voice types. Fortunately, due to the flexibility of STRAIGHT, different F0 and aperiodicity measures can be used in the synthesis without any inherent degradations in speech quality. We have replaced the STRAIGHT excitation with a representation based on a robust F0 measure and a carefully determined two-band voicing. According to our analysis-synthesis experiments, the new parameterization can improve the speech quality. In HMM-based speech synthesis, the quality is significantly improved especially due to the better modeling of vocal fry.

Full Paper

Bibliographic reference.  Silén, Hanna / Helander, Elina / Nurminen, Jani / Gabbouj, Moncef (2009): "Parameterization of vocal fry in HMM-based speech synthesis", In INTERSPEECH-2009, 1775-1778.