ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Towards a linear dynamical model based speech synthesizer

Vassilis Tsiaras, Ranniery Maia, Vassilis Diakoloukas, Yannis Stylianou, Vassilis Digalakis

We present recent developments towards building a speech synthesis system completely based on Linear Dynamical Models (LDMs). Specifically, we describe a decision tree-based context clustering approach to LDM-based speech synthesis and an algorithm for parameter generation using global variance with LDMs. In order to capture the speech dynamics, LDMs need coarser phoneme segmentation than the 5-state segmentation usually used in Hidden Markov Model (HMM)-based speech synthesis. Therefore, using LDMs to evaluate the clustering of longer phoneme segments improves the linguistic-to-acoustic mapping and leads to trajectories of synthetic speech parameters without discontinuities and closer to the natural ones. It also decreases the footprint of the system since the total number of decision tree leaves is smaller than the total number of leaves usually produced in a typical HMM-based synthesizer. On the other hand, global variance greatly improves the naturalness of the synthesized speech. According to subjective evaluation, the proposed LDM-based system with only 25% of the parameters of a baseline HMM-based synthesizer is able to produce synthetic speech of similar quality.

doi: 10.21437/Interspeech.2015-308

Cite as: Tsiaras, V., Maia, R., Diakoloukas, V., Stylianou, Y., Digalakis, V. (2015) Towards a linear dynamical model based speech synthesizer. Proc. Interspeech 2015, 1221-1225, doi: 10.21437/Interspeech.2015-308

  author={Vassilis Tsiaras and Ranniery Maia and Vassilis Diakoloukas and Yannis Stylianou and Vassilis Digalakis},
  title={{Towards a linear dynamical model based speech synthesizer}},
  booktitle={Proc. Interspeech 2015},