ISCA Archive SpeechProsody 2010
ISCA Archive SpeechProsody 2010

Resynthesis of prosodic information using the cepstrum vocoder

Hussein Hussein, Guntram Strecha, Rüdiger Hoffmann

The naturalness of synthetic speech depends on automatic extraction of prosodic features and prosody modeling. To improve the naturalness of the synthesized speech, we want to apply the concept of Analysis-by-Synthesis of prosodic information. Therefore, the accents and phrases of the speech signal were extracted using the quantitative Fujisaki model in a recognition model. In a generative model we resynthesized the speech signal using a cepstrum vocoder. The excitation signal of the vocoder are the pitch marks (PM), which were calculated from multiple levels of the accent and phrase marking algorithm. A preference test was performed to confirm the performance of the proposed method. For every speech signal four signals were resynthesized according to the calculated PM. Evaluators compared the resynthesized signals with one another. Results show that the quality of the resynthesized signal after prosodic marking is better.

Index Terms: analysis-by-synthesis, prosodic marking, Fujisaki model

Cite as: Hussein, H., Strecha, G., Hoffmann, R. (2010) Resynthesis of prosodic information using the cepstrum vocoder. Proc. Speech Prosody 2010, paper 358

  author={Hussein Hussein and Guntram Strecha and Rüdiger Hoffmann},
  title={{Resynthesis of prosodic information using the cepstrum vocoder}},
  booktitle={Proc. Speech Prosody 2010},
  pages={paper 358}