We have formerly proposed a statistical model of moraic transitions of fundamental frequency (F0) contours and showed its effectiveness for prosodic boundary detection and accent type recognition. This model represented F0 contours of prosodic words to simultaneously detect and recognize prosodic word boundaries and accent types. This paper proposes a method where prosodic word F0 contours are modeled separately according to their accent types and presence/absence of succeeding pauses. An utterance is regarded as a sequence of prosodic words under a simple grammar. Each moraic F0 contour is represented by a pair of codes; the original shape code and the newly introduced delta code representing the degree of F0 change between the mora in question and its preceding mora. Compared with earlier results, the boundary detection rate improves from 87.7% to 91.5%. Accent type recognition rate reached 76.0% (type 1 accent discrimination).
Cite as: Iwano, K., Hirose, K. (1998) Representing prosodic words using statistical models of moraic transition of fundamental frequency contours of Japanese. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0731, doi: 10.21437/ICSLP.1998-116
@inproceedings{iwano98_icslp, author={Koji Iwano and Keikichi Hirose}, title={{Representing prosodic words using statistical models of moraic transition of fundamental frequency contours of Japanese}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0731}, doi={10.21437/ICSLP.1998-116} }