Eighth ISCA Workshop on Speech Synthesis
Barcelona, Catalonia, Spain
This paper describes the text normalization module of a text to speech fullytrainable conversion system and its application to number transcription. The main target is to generate a language independent text normalization module, based on data instead of on expert rules. This paper proposes a general architecture based on statistical machine translation techniques. This proposal is composed of three main modules: a tokenizer for splitting the text input into a token graph, a phrasebased translation module for token translation, and a post-processing module for removing some tokens. This architecture has been evaluated for number transcription in several languages: English, Spanish and Romanian. Number transcription is an important aspect in the text normalization problem. Index Terms: Multilingual Number Transcription, text normalization, fully-trainable text conversion.
Bibliographic reference. San-Segundo, Rubén / Montero, Juan Manuel / Giurgiu, Mircea / Muresan, Ioana / King, Simon (2013): "Multilingual number transcription for text-to-speech conversion", In SSW8, 65-69.