Third European Conference on Speech Communication and Technology

Berlin, Germany
September 22-25, 1993


Intelligibility as a Function of Speech Coding Method for Template-Based Speech Synthesis

Marian Macchi, Mary Jo Altom, Dan Kahn, Sharad Singhal, Murray F. Spiegel

Bellcore (Bell Communications Research), Morristown, NJ, USA

We have been experimenting with various methods for coding the templates used in a concatenative speech synthesis system: standard pulse I noise-excited LPC; a newer waveform technique, PSOLA (pitch-synchronous overlap-and-add); and two types of residual-excited LPC (RELP): simple RELP, in which the residual was modified by truncation or padding with zeros, and PSOLA RELP, in which PSOLA was used to modify the residual. We used these techniques to code spoken words that were similar to the templates in an inventory, and resynthesized the words. We also modified the pitch of the words, as is required by text-to-speech synthesis systems, and resynthesized the pitch-modified words. We conducted listening tests to measure the consonant intelligibility in the words with and without the pitch change. Thus we were able to see how intelligibility was affected by the coding method itself and by changes to the pitch. The results showed that RELP provided higher intelligibility than PSOLA for voiced consonants, and considerably higher intelligibility than standard pulse/noise-excited LPC, even when pitch changes were imposed on the words. In addition, simple RELP performed about as well as PSOLA-RELP.

Keywords: speech synthesis, speech coding

