We present recent developments towards building a speech synthesis system completely based on Linear Dynamical Models (LDMs). Specifically, we describe a decision tree-based context clustering approach to LDM-based speech synthesis and an algorithm for parameter generation using global variance with LDMs. In order to capture the speech dynamics, LDMs need coarser phoneme segmentation than the 5-state segmentation usually used in Hidden Markov Model (HMM)-based speech synthesis. Therefore, using LDMs to evaluate the clustering of longer phoneme segments improves the linguistic-to-acoustic mapping and leads to trajectories of synthetic speech parameters without discontinuities and closer to the natural ones. It also decreases the footprint of the system since the total number of decision tree leaves is smaller than the total number of leaves usually produced in a typical HMM-based synthesizer. On the other hand, global variance greatly improves the naturalness of the synthesized speech. According to subjective evaluation, the proposed LDM-based system with only 25% of the parameters of a baseline HMM-based synthesizer is able to produce synthetic speech of similar quality.
Bibliographic reference. Tsiaras, Vassilis / Maia, Ranniery / Diakoloukas, Vassilis / Stylianou, Yannis / Digalakis, Vassilis (2015): "Towards a linear dynamical model based speech synthesizer", In INTERSPEECH-2015, 1221-1225.