12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Unsupervised Features from Text for Speech Synthesis in a Speech-to-Speech Translation System

Oliver Watts (1), Bowen Zhou (2)

(1) University of Edinburgh, UK
(2) IBM T.J. Watson Research Center, USA

We explore the use of linguistic features for text to speech (TTS) conversion in the context of a speech-to-speech translation system that can be extracted from unannotated text in an unsupervised, language-independent fashion. The features are intended to act as surrogates for conventional part of speech (POS) features. Unlike POS features, the experimental features assume only the availability of tools and data that must already be in place for the construction of other components of the translation system, and can therefore be used for the TTS module without incurring additional TTS-specific costs. We here describe the use of the experimental features in a speech synthesiser, using six different configurations of the system to allow the comparison of the proposed features with conventional, knowledge-based POS features. We present results of objective and subjective evaluations of the usefulness of the new features.

Full Paper

Bibliographic reference.  Watts, Oliver / Zhou, Bowen (2011): "Unsupervised features from text for speech synthesis in a speech-to-speech translation system", In INTERSPEECH-2011, 2153-2156.