Text-to-speech synthesis in Indian English is useful for delivering messages stored in computers and web to the Indian users unfamiliar with standard English accent. Such work is going on at TIFR and the paper reports the salient features of the front-end language processor that generates pronunciation plus stress information. The important components of the language processor are the parser to categorize words, an Indian English phonetic dictionary, morphological analyzer, letter-to-sound rules, phonological rules, prosody rules and Indian name detector. The relevant rules are formulated with the aid of a large CMU pronunciation dictionary and a language tool GENEX, developed in-house, that can generate a sub-dictionary following a set of specified constraints. The paper outlines the rule formulation procedure and provides examples of various types of rules. A few important morphological rules and letter-to-sound rules are described in detail.
Cite as: Sen, A. (2003) Pronunciation rules for Indian English text-to-speech system. Proc. Workshop on Spoken Language Processing, 141-148
@inproceedings{sen03_wslp, author={Aniruddha Sen}, title={{Pronunciation rules for Indian English text-to-speech system}}, year=2003, booktitle={Proc. Workshop on Spoken Language Processing}, pages={141--148} }