Sixth International Conference on Spoken Language Processing
Our paper has 5 sections. In section (1) we will discuss critically the fact that the development of Text-to-Speech systems and Speech-to-Text systems has in the past been treated as totally separate problems (we restrict ourselves to so-called dictation systems, L2S and S2L, which either translate written language units L into speech signals S, or speech signals S into sequences of written language units L). In section (2) we argue that for this reason, in the future, theoretical and empirical work should be devoted to providing an approach that integrates the L2S and S2L components into a unified phonetic system, which is able to learn to speak a language and also to understand what other L2S-systems are saying.
The new Munich PHD-system will be described in section (3) as an example of such a unified approach. Fundamental to this system is the selection and definition of lexically-given speech items, both acoustically and articulatorily (EMA). In section (4) we demonstrate a set of prosodic functions that take lexically-defined L-inputs and produce phonetically well-formed connected Soutputs. We discuss the possibility of combining certain elementary functions (such as those controlling F0 variation, segment duration, and sound modification) into a much more complex function which also controls the language-specific rhythmic variation of speech tempo in its locally measurable form. Finally section (5) will raise the question of analysing speech data produced by individual speakers as a means of arriving at the sound production system of a generalized representative member of the sociolect or dialect of the language in question.
Bibliographic reference. Tillmann, Hans G. / Pfitzinger, Hartmut R. (2000): "Parametric high definition (PHD) speech synthesis-by-analysis: the development of a fundamentally new system creating connected speech by modifying lexically-represented language units", In ICSLP-2000, vol.3, 295-297.