ISCA Archive SSW 2007
ISCA Archive SSW 2007

Flexible harmonic/stochastic speech synthesis

Daniel Erro, AsunciĆ³n Moreno, Antonio Bonafonte

In this paper, our flexible harmonic/stochastic waveform generator for a speech synthesis system is presented. The speech is modeled as the superposition of two components: a harmonic component and a stochastic or aperiodic component. The purpose of this representation is to provide a framework with maximum flexibility for all kind of speech transformations. In contrast to other similar systems found in the literature, like HNM, our system can operate using constant frame rate instead of a pitch-synchronous scheme. Thus, the analysis process is simplified, while the phase coherence is guaranteed by the new prosodic modification and concatenation procedures that have been designed for this scheme. As the system was created for voice conversion applications, in this work, as a previous step, we validate its performance in a speech synthesis context by comparing it to the well-known TD-PSOLA technique, using four different voices and different synthesis database sizes. The opinions of the listeners indicate that the methods and algorithms described are preferred rather than PSOLA, and thus are suitable for high-quality speech synthesis and for further voice transformations.

Cite as: Erro, D., Moreno, A., Bonafonte, A. (2007) Flexible harmonic/stochastic speech synthesis. Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6), 194-199

  author={Daniel Erro and AsunciĆ³n Moreno and Antonio Bonafonte},
  title={{Flexible harmonic/stochastic speech synthesis}},
  booktitle={Proc. 6th ISCA Workshop on Speech Synthesis (SSW 6)},