Bertsokantari: a TTS Based Singing Synthesis System

Eder del Blanco, Inma Hernaez, Eva Navas, Xabier Sarasola, D. Erro


This paper describes the implementation of the Aholab entry for the Singing Synthesis Challenge: Fill-in the Gap. Our approach in this work makes use of an HTS based Text-to-Speech (TTS) synthesizer for Basque to generate the singing voice. The prosody related parameters provided by the TTS system for a spoken version of the score are modified to adapt them to the requirements of the music score concerning syllables duration and tone, while the spectral parameters are basically maintained. The paper describes the processing details developed to improve the quality of the output signal: the syllable timing, the generation of the intonation with vibrato and the manipulation of the model states. In this entry, the lyrics have been freely translated into Basque and the rhythm has been adapted to a Basque traditional rhythm.


DOI: 10.21437/Interspeech.2016-1123

Cite as

Blanco, E.d., Hernaez, I., Navas, E., Sarasola, X., Erro, D. (2016) Bertsokantari: a TTS Based Singing Synthesis System. Proc. Interspeech 2016, 1240-1244.

Bibtex
@inproceedings{Blanco+2016,
author={Eder del Blanco and Inma Hernaez and Eva Navas and Xabier Sarasola and D. Erro},
title={Bertsokantari: a TTS Based Singing Synthesis System},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1123},
url={http://dx.doi.org/10.21437/Interspeech.2016-1123},
pages={1240--1244}
}