The ESCA Workshop on Speech Synthesis
September 25-28, 1990
The problem of adequate dynamic modeling of the speech spectrum is explored for general text-to-speech applications. Using analysis of formant patterns from English speech, natural formant patterns in time are compared with those produced by the MITalk system, noting where the system has difficulties in modeling spectral transitions. Phonetic contexts where a diphone approach would have the most difficulty are noted, i.e., where the diphone coarticulation assumption is invalid. To improve phoneme-based synthesis systems, better rules are needed to model coarticulation for phoneme-concatenation synthesis. To improve diphone synthesis, I enumerate contexts where triphones would better model natural speech.
Bibliographic reference. O'Shaughnessy, Douglas (1990): "Spectral transitions in rule-based and diphone synthesis", In SSW1-1990, 21-24.