5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Automatic Generation of Context-Dependent Pronunciations

Mosur Ravishankar, Maxine Eskenazi

School of Computer Science, Carnegie Mellon University Pittsburgh, PA, USA

We describe experiments in modelling the dynamics of fluent speech in which word pronunciations are modified by neighbouring context. Based on all-phone decoding of large Volumes of training data, we automatically derive new word pronunciation, and context-dependent transformation rules for phone sequences. In contrast to existing techniques, the rules can be applied even to words not in the training set, and across word boundaries, thus modelling context-dependent behavior. We use the technique on the Wall Street Journal (WSJ) training data and apply the new pronunciations and rules to WSJ and broadcast news tests. The changes correct a significant portion of the errors they could potentially correct. But the transformations introduce a comparable number of new errors, indicating that perhaps stronger constraints on the application of such rules are needed.

Full Paper

Bibliographic reference.  Ravishankar, Mosur / Eskenazi, Maxine (1997): "Automatic generation of context-dependent pronunciations", In EUROSPEECH-1997, 2467-2470.