One consequence of situated face-to-face conversation is the co-observability of participants’ respiratory movements and sounds. We explore whether this information can be exploited in predicting incipient speech activity. Using a methodology called stochastic turn-taking modeling, we compare the performance of a model trained on speech activity alone to one additionally trained on static and dynamic lung volume features. The methodology permits automatic discovery of temporal dependencies across participants and feature types. Our experiments show that respiratory information substantially lowers cross-entropy rates, and that this generalizes to unseen data.
Cite as: Włodarczak, M., Laskowski, K., Heldner, M., Aare, K. (2017) Improving Prediction of Speech Activity Using Multi-Participant Respiratory State. Proc. Interspeech 2017, 1666-1670, doi: 10.21437/Interspeech.2017-1176
@inproceedings{wodarczak17_interspeech, author={Marcin Włodarczak and Kornel Laskowski and Mattias Heldner and Kätlin Aare}, title={{Improving Prediction of Speech Activity Using Multi-Participant Respiratory State}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={1666--1670}, doi={10.21437/Interspeech.2017-1176} }