11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Modeling of Sentence-Medial Pauses in Bangla Readout Speech: Occurrence and Duration

Shyamal Das Mandal (1), Arup Saha (1), Tulika Basu (1), Keikichi Hirose (2), Hiroya Fujisaki (2)

(1) C-DAC, India
(2) University of Tokyo, Japan

Control of pause occurrence and duration is an important issue for text-to-speech synthesis systems. In text-readout speech, pauses occur unconditionally at sentence boundaries and with high probability at major syntactic boundaries such as clause boundaries, but more or less arbitrarily at minor syntactic boundaries. Pause duration tends to be longer at the end of a longer syntactic unit. A detailed analysis is conducted for sentence-medial pauses for readout speech of Bangla. Based on the results, linear models (with variables of syntactic unit length and distance to directly modifying word) are constructed for pause occurrence and duration. The models are evaluated using the test data not included in the analyzed data (open-test condition). The results show that the proposed models can predict occurrence probability for 87% of phrase boundaries correctly, and pause duration within 100 ms for 80% of the cases.

