11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

The Role of Higher-Level Linguistic Features in HMM-Based Speech Synthesis

Oliver Watts, Junichi Yamagishi, Simon King

Centre for Speech Technology Research, University of Edinburgh, UK

We analyse the contribution of higher-level elements of the linguistic specification of a data-driven speech synthesiser to the naturalness of the synthetic speech which it generates. The system is trained using various subsets of the full feature-set, in which features relating to syntactic category, intonational phrase boundary, pitch accent and boundary tones are selectively removed. Utterances synthesised by the different configurations of the system are then compared in a subjective evaluation of their naturalness. The work presented forms background analysis for an on-going set of experiments in performing text-to-speech (TTS) conversion based on shallow features: features that can be trivially extracted from text. By building a range of systems, each assuming the availability of a different level of linguistic annotation, we obtain benchmarks for our on-going work.

Full Paper

Bibliographic reference.  Watts, Oliver / Yamagishi, Junichi / King, Simon (2010): "The role of higher-level linguistic features in HMM-based speech synthesis", In INTERSPEECH-2010, 841-844.