5th International Conference on Spoken Language Processing
In this paper we describe the techniques and methodology developed for automatic labeling of segmental and prosodic information for the Mandarin speech database. There are two major procedures. First, the text is converted into the phonetic network of possible pronunciations, and this network is aligned with the speech data by recognition processes. Secondly, many acoustic prosodic features are derived and the break indices are labeled with these features by decision trees. For the segmental labeling, 96.5% of automatically determined segment boundaries are accurate within a range of 20 ms. For the prosodic labeling, 84.9% of the automatic labeled break indices are the same with the manual labeled one.
Bibliographic reference. Chou, Fu-Chiang / Tseng, Chiu-Yu / Lee, Lin-Shan (1998): "Automatic segmental and prosodic labeling of Mandarin speech database", In ICSLP-1998, paper 0266.