ESCA Tutorial and Research Workshop on
Speech Input/Output Assessment and Speech Databases

Noordwijkerhout, The Netherlands
September 20-23, 1989

Speech Recognizer Sensitivity to the Variation of Different Control Parameters in Synthetic Speech

Sabine Crosnier (1), Mats Blomberg (2), Kjell Elenius (2)

(1) Thesis student at KTH from ENST, Paris, France
(2) Department of Speech Communication and Music Acoustics, KTH, Stockholm, Sweden

Knowledge of a speech recognizer's sensitivity to different speech production parameters can be used to improve the system or to predict its behaviour in a given application. In this report, a speech recognition system has been tested using manipulated synthetic speech. A text-to-speech system was used for producing words with the 9 Swedish long vowels in CVC context. A "normal" production of each word served as reference template for the recognition system. The test set consisted of the same words where the value of one control parameter at a time was changed from its original position. The mel cepstrum distance between the reference and the manipulated word was measured. Modifying the pitch, voice source spectral slope and the first four formant frequencies had large influence on the distance, while varying formant bandwiths resulted in small effects. The relation between individual formants is different to results from experiments using natural listeners. The results indicate that the sensitivity to pitch and voice source spectrum variation will degrade the recognizer's performance in speaker-independent applications and during stress and that some form of normalisation is needed.

