INTERSPEECH 2004 - ICSLP
In this study, we introduce a method of estimating the syntactic tree structure of Japanese speech on the basis of the F0 contour and the time duration. We introduce a method of estimating the syntactic structure including the following phrase by using the local prosodic features of the first and final part of the leading phrase. This method involves discriminant analysis which is statistical method based on a large amount of training data. We applied the method to the ATR 503 speech database, and performed discrimination experiments. The results indicated an estimation accuracy of 84% for the branching judgment of each sequence of three leaves. In addition, the accuracy of discrimination saturated when using only the features up to the head part of the second phrase. We consider this result to be fairly good for the difficult task of estimating a syntactic structure that includes a future part on the basis of using only local prosodic features in the past, and also consider prosodic information to be very effective in real-time communication with speech.
Bibliographic reference. Ohsuga, Tomoko / Nishida, Masafumi / Horiuchi, Yasuo / Ichikawa, Akira (2004): "Estimating syntactic structure from prosodic features in Japanese speech", In INTERSPEECH-2004, 3041-3044.