Expressive Singing Synthesis Based on Unit Selection for the Singing Synthesis Challenge 2016

Jordi Bonada, Martí Umbert, Merlijn Blaauw


Sample and statistically based singing synthesizers typically require a large amount of data for automatically generating expressive synthetic performances. In this paper we present a singing synthesizer that using two rather small databases is able to generate expressive synthesis from an input consisting of notes and lyrics. The system is based on unit selection and uses the Wide-Band Harmonic Sinusoidal Model for transforming samples. The first database focuses on expression and consists of less than 2 minutes of free expressive singing using solely vowels. The second one is the timbre database which for the English case consists of roughly 35 minutes of monotonic singing of a set of sentences, one syllable per beat. The synthesis is divided in two steps. First, an expressive vowel singing performance of the target song is generated using the expression database. Next, this performance is used as input control of the synthesis using the timbre database and the target lyrics. A selection of synthetic performances have been submitted to the Interspeech Singing Synthesis Challenge 2016, in which they are compared to other competing systems.


DOI: 10.21437/Interspeech.2016-872

Cite as

Bonada, J., Umbert, M., Blaauw, M. (2016) Expressive Singing Synthesis Based on Unit Selection for the Singing Synthesis Challenge 2016. Proc. Interspeech 2016, 1230-1234.

Bibtex
@inproceedings{Bonada+2016,
author={Jordi Bonada and Martí Umbert and Merlijn Blaauw},
title={Expressive Singing Synthesis Based on Unit Selection for the Singing Synthesis Challenge 2016},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-872},
url={http://dx.doi.org/10.21437/Interspeech.2016-872},
pages={1230--1234}
}