This paper reports on a preliminary implementation of the SpeakEZ speech synthesis system. The system is built to explore the use of naturally occuring pronunciation, coarticulation and prosody for speech synthesis. SpeakEZ uses concatenative synthesis, choosing target phonemes from a database corpus of 115,000 prerecorded phonemes. The database of phonemes is segmented and labeled using the Sphinx speech recognition system. During synthesis, target phonemes are selected based on heuristics relating to phoneme context and syllable, word, and utterance position. The phonemes are concatenated in the time domain, using pitch synchronous overlap-add (PSOLA) smoothing between adjacent phonemes. Results from a preliminary evaluation of the system show that the system can at times provide excellent synthetic speech, but still has several shortcomings.
Keywords: Speech synthesis, time domain concatenation, corpus-based, pitch synchronous overlap-add (PSOLA).
Bibliographic reference. Hauptmann, Alexander G. (1993): "SPEAKEZ: a first experiment in concatenation synthesis from a large corpus", In EUROSPEECH'93, 1701-1704.