ITRW on Non-Linear Speech Processing
(NOLISP 07)

Paris, France
May 22-25, 2007

HMM-based Spanish speech synthesis using CBR as F0 estimator

Xavi Gonzalvo, Ignasi Iriondo, Joan Claudi Socoró, Francesc Alías, Carlos Monzo

Department of Communications and Signal Theory, Enginyeria i Arquitectura La Salle, Ramon Llull University, Barcelona, Spain

Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is a technique for generating speech from trained statistical models where spectrum, pitch and durations of basic speech units are modelled altogether. The aim of this work is to describe a Spanish HMM-TTS system using CBR as a F0 estimator, analysing its performance objectively and subjectively. The experiments have been conducted on a reliable labelled speech corpus, whose units have been clustered using contextual factors according to the Spanish language. The results show that the CBR-based F0 estimation is capable of improving the HMM-based baseline performance when synthesizing nondeclarative short sentences and reduced contextual information is available.

Full Paper

Bibliographic reference.  Gonzalvo, Xavi / Iriondo, Ignasi / Socoró, Joan Claudi / Alías, Francesc / Monzo, Carlos (2007): "HMM-based Spanish speech synthesis using CBR as F0 estimator", In NOLISP-2007, 7-10.