Statistical parametric synthesis of budgerigar songs

Lorenz Gutscher, Michael Pucher, Carina Lozo, Marisa Hoeschele, Daniel C. Mann


In this paper we present the synthesis of budgerigar songs with Hidden Markov Models (HMMs) and the HMM-based Speech Synthesis System (HTS). Budgerigars can produce complex and diverse sounds that are difficult to categorize. We adapted techniques that are commonly used in the area of speech synthesis so that we can use them for the synthesis of budgerigar songs. To segment the recordings, the songs are broken down into phrases, which are sounds separated by silence. Complex phrases furthermore can be subdivided into smaller units and then be clustered to identify recurring elements. These element categories along with additional contextual information are used together to enhance the training and synthesis. Overall, the aim of the process is to offer an interface that generates new sequences and compositions of bird songs based on user input, consisting of the desired song structure and contextual information. Finally, an objective evaluation comparing the synthesized output to the natural recording is performed, and a subjective evaluation with human listeners shows that they prefer resynthesized over natural recordings and that they perceive no significant differences in terms of naturalness between natural, resynthesized, and synthesized versions1 .


 DOI: 10.21437/SSW.2019-23

Cite as: Gutscher, L., Pucher, M., Lozo, C., Hoeschele, M., C. Mann, D. (2019) Statistical parametric synthesis of budgerigar songs. Proc. 10th ISCA Speech Synthesis Workshop, 127-131, DOI: 10.21437/SSW.2019-23.


@inproceedings{Gutscher2019,
  author={Lorenz Gutscher and Michael Pucher and Carina Lozo and Marisa Hoeschele and Daniel  {C. Mann}},
  title={{Statistical parametric synthesis of budgerigar songs}},
  year=2019,
  booktitle={Proc. 10th ISCA Speech Synthesis Workshop},
  pages={127--131},
  doi={10.21437/SSW.2019-23},
  url={http://dx.doi.org/10.21437/SSW.2019-23}
}