ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Choose the best to modify the least: a new generation concatenative synthesis system

Marcello Balestri, Alberto Pacchiotti, Silvia Quazza, Pier Luigi Salza, Stefano Sandri

The paper describes a corpus-based approach applied in the evolution of ELOQUENS ® , the CSELT text-to-speech synthesis system for Italian, towards multi-voice, multi-language, high-naturalness concatenative synthesis. The acoustic modules have been redesigned, according to the idea of reducing the number of junctions and the need of prosodic modification. Appropriate phonetic coverage methods were applied in the acoustic database design. Automatic processing tools performed phone and diphone segmentation, pitch marking, prosodic feature detection. The synthesis algorithm exploits the speech material at its best, searching for the longest suitable sequences in the database, according to weighted distance measures on phonetic/prosodic parameters. Signal modification techniques are applied only if necessary, to smooth residual prosodic jumps at unit boundaries. The resulting voice is quite human-sounding.


doi: 10.21437/Eurospeech.1999-499

Cite as: Balestri, M., Pacchiotti, A., Quazza, S., Salza, P.L., Sandri, S. (1999) Choose the best to modify the least: a new generation concatenative synthesis system. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 2291-2294, doi: 10.21437/Eurospeech.1999-499

@inproceedings{balestri99_eurospeech,
  author={Marcello Balestri and Alberto Pacchiotti and Silvia Quazza and Pier Luigi Salza and Stefano Sandri},
  title={{Choose the best to modify the least: a new generation concatenative synthesis system}},
  year=1999,
  booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)},
  pages={2291--2294},
  doi={10.21437/Eurospeech.1999-499}
}