Eighth ISCA Workshop on Speech Synthesis

Barcelona, Catalonia, Spain
August 31-September 2, 2013

Multilingual Number Transcription for Text-to-Speech Conversion

Rubén San-Segundo (1), Juan Manuel Montero (1), Mircea Giurgiu (2), Ioana Muresan (2), Simon King (3)

(1) Speech Technology Group, ETSI Telecomunicación, UPM, Spain
(2) Dept. of Telecommun., Tech. Univ. of Cluj-Napoca, Cluj-Napoca, Romania
(3) University of Edinburgh, UK

This paper describes the text normalization module of a text to speech fullytrainable conversion system and its application to number transcription. The main target is to generate a language independent text normalization module, based on data instead of on expert rules. This paper proposes a general architecture based on statistical machine translation techniques. This proposal is composed of three main modules: a tokenizer for splitting the text input into a token graph, a phrasebased translation module for token translation, and a post-processing module for removing some tokens. This architecture has been evaluated for number transcription in several languages: English, Spanish and Romanian. Number transcription is an important aspect in the text normalization problem. Index Terms: Multilingual Number Transcription, text normalization, fully-trainable text conversion.

Full Paper

Bibliographic reference.  San-Segundo, Rubén / Montero, Juan Manuel / Giurgiu, Mircea / Muresan, Ioana / King, Simon (2013): "Multilingual number transcription for text-to-speech conversion", In SSW8, 65-69.