Based on a new approach of representing F0 in mora units (F0mora), several parameters (average and target values of F0 in CV and VC segments) were proposed. Then, a variable (F0ratio) was defined representing the F0 movement between two consecutive morae, and their distributions were analyzed and used to create models for each accent type. Evaluation results indicated that average values of VC units and target values of CV units showed the best performances in the accent type identification task. In order to investigate the causes of these results from a perceptual viewpoint, the candidates for F0mora were checked considering how they were related to perceived mora pitch values. For this purpose, MIDI sounds were used as references to perceived mora pitch (F0human). Analysis on the mismatches between F0human and the proposed F0mora parameters showed mismatches especially when pitch change occurs within the syllables. As for the intonation type identification, several acoustic features were proposed to represent 6 types of sentence final tones, each conveying different information of subjects intentions and attitudes. The proposed acoustic features for relative duration and sentence final pitch change showed good correspondence to perceptual features.
Cite as: Ishi, C.T., Minematsu, N., Hirose, K. (2001) Recognition of accent and intonation types of Japanese using F0 parameters related to human pitch perception. Proc. ITRW on Prosody in Speech Recognition and Understanding, paper 13
@inproceedings{ishi01_prosody, author={Carlos Toshinori Ishi and Nobuaki Minematsu and Keikichi Hirose}, title={{Recognition of accent and intonation types of Japanese using F0 parameters related to human pitch perception}}, year=2001, booktitle={Proc. ITRW on Prosody in Speech Recognition and Understanding}, pages={paper 13} }