September 22-25, 1997
We describe experiments in modelling the dynamics of fluent speech in which word pronunciations are modified by neighbouring context. Based on all-phone decoding of large Volumes of training data, we automatically derive new word pronunciation, and context-dependent transformation rules for phone sequences. In contrast to existing techniques, the rules can be applied even to words not in the training set, and across word boundaries, thus modelling context-dependent behavior. We use the technique on the Wall Street Journal (WSJ) training data and apply the new pronunciations and rules to WSJ and broadcast news tests. The changes correct a significant portion of the errors they could potentially correct. But the transformations introduce a comparable number of new errors, indicating that perhaps stronger constraints on the application of such rules are needed.
Bibliographic reference. Ravishankar, Mosur / Eskenazi, Maxine (1997): "Automatic generation of context-dependent pronunciations", In EUROSPEECH-1997, 2467-2470.