Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Concatenative Arabic Speech Synthesis Using Large Speech Database

Wael M. Hamza (1), Mohsen A. Rashwan (2)

(1) IBM European Speech Research, IBM Egypt
(2) Cairo University, Egypt

Speech synthesis has got a lot of research interest as it represents an important part in a complete text-to-speech system. In this paper, an Arabic speech synthesis system has been proposed. The proposed system belongs to the family of concatenative speech synthesis systems that use large speech database. The concatenation unit inventory has been automatically constructed from a pre-recorded one hour of speech using context dependent HMM. A unified way to the unit selection has been introduced to enable the use of any type of concatenation units. The introduction of the context cost in the unit selection algorithm makes it easy to use longer and non-uniform units using the same framework. Context cost is represented as the distance between leafs of the context clustering trees that have been grown during the HMM acoustic modeling. Selected unit occurrences have been time and/or pitch scaled to match the required target. This operation is done using an adapted version of sinusoidal model. This version is referred to as Pitch- Synchronous All-Harmonic model. The resulting system has been evaluated using two types of evaluation tests. A word error of 10.3 % has been achieved in a DRT-like test while 3.8 score has been recorded in a subjective MOS-like test. These results show that the proposed system can be used as a front-end synthesizer of a complete Arabic text-to-speech system.

Bibliographic reference.  Hamza, Wael M. / Rashwan, Mohsen A. (2000): "Concatenative arabic speech synthesis using large speech database", In ICSLP-2000, vol.2, 182-185.