5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Factors Affecting Perceived Quality and Intelligibility in the CHATR Concatenative Speech Synthesiser

Nick Campbell, Yoshiharu Itoh, Wen Ding, Norio Higuchi

ATR Interpreting Telecommunications Research Laboratories

In order to eliminate trial-and-error in the process of selecting a good speech database as a voice source for concatenative speech synthesis, and to determine the acoustic and prosodic characteristics that best predict 'appeal' or perceived 'quality' in the synthesised speech, we performed tests to evaluate listener preferences over a range of different synthesised voices. We found that variation of fundamental frequency in the source database, and open-quotient of the glottis as measured by joint-estimation (ARX) were the best correlates. To our surprise, there was very little correlation between the scores for 'intelligibility' and those for 'naturalness' in the test data, but the former showed a close correlation with durational characteristics, and the latter with pitch and loudness.

