This study is aimed at uncovering a way that participants in conversation predict end-of-utterance for spontaneous Japanese speech. In spontaneous everyday conversation, the participants must predict the ends of utterances of a speaker to perform smooth turn-taking without too much gap. We consider that they utilize not only syntactic factors but also prosodic factors for the end-of-utterance prediction because of the difficulty of prediction of a syntactic completion point in spontaneous Japanese. In previous studies, we found that prosodic features changed significantly in the final accentual phrase. However, it is not clear what prosodic features support the prediction. In this paper, we focused on dependency structure among bunsetsu-phrases as the syntactic factor, and investigated the relation between the phrase-dependency and prosodic features. The results showed that the average fundamental frequency and the average intensity for accentual phrases did not decline until the modified phrase appeared. Next, to predict the end of utterance from the syntactic and prosodic features, we constructed a generalized linear mixed model. The model provided higher accuracy than using the prosodic features only. These suggest the possibility that prosodic changes and phrase-dependency relations inform the hearer that the utterance is approaching its end.
Cite as: Ishimoto, Y., Teraoka, T., Enomoto, M. (2017) End-of-Utterance Prediction by Prosodic Features and Phrase-Dependency Structure in Spontaneous Japanese Speech. Proc. Interspeech 2017, 1681-1685, doi: 10.21437/Interspeech.2017-837
@inproceedings{ishimoto17_interspeech, author={Yuichi Ishimoto and Takehiro Teraoka and Mika Enomoto}, title={{End-of-Utterance Prediction by Prosodic Features and Phrase-Dependency Structure in Spontaneous Japanese Speech}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={1681--1685}, doi={10.21437/Interspeech.2017-837} }