First ISCA ITRW on Auditory Quality of Systems
April 23-25, 2003
The overall auditory quality of the speech signal produced by a text-to-speech (TTS) system depends on different factors which may be ascribed to different components of the synthesis system. During the development of the Dresden Speech Synthesizer DRESS, the authors investigated the influence of different system components separately. The results of these investigations are summarized and updated in this paper.
In a recent application study, the authors were forced to select and to adapt system components for achieving a total system footprint less than 1 Megabyte (MB). The resulting scaleable low-resource system was denoted as microDRESS. The auditory comparison of microDRESS to the baseline system DRESS implies that the inventory coding has the most essential influence on both, system size and auditory quality. For the applied ADPCM coding scheme, the measured degradation of approximately one category on the mean opinion scale (MOS) is caused less by the coding algorithm itself but mainly by the limitation to telephone bandwidth.
Allowing more system resources, there are interesting options for the TTS system configuration to achieve best-possible overall quality which are discussed in the paper.
Bibliographic reference. Jokisch, O. / Hoffmann, Rüdiger / Eichner, M. / Werner, S. / Kruschke, H. / Kordon, Ulrich (2003): "The influence of the TTS system configuration on the perceived quality of synthesized speech", In AQS-2003, 118-125.