Optimizing Speech-Input Length for Speaker-Independent Depression Classification

Tomasz Rutowski, Amir Harati, Yang Lu, Elizabeth Shriberg


Machine learning models for speech-based depression classification offer promise for health care applications. Despite growing work on depression classification, little is understood about how the length of speech-input impacts model performance. We analyze results for speaker-independent depression classification using a corpus of over 1400 hours of speech from a human-machine health screening application. We examine performance as a function of response input length for two NLP systems that differ in overall performance.

Results for both systems show that performance depends on natural length, elapsed length, and ordering of the response within a session. Systems share a minimum length threshold, but differ in a response saturation threshold, with the latter higher for the better system. At saturation it is better to pose a new question to the speaker, than to continue the current response. These and additional reported results suggest how applications can be better designed to both elicit and process optimal input lengths for depression classification.


 DOI: 10.21437/Interspeech.2019-3095

Cite as: Rutowski, T., Harati, A., Lu, Y., Shriberg, E. (2019) Optimizing Speech-Input Length for Speaker-Independent Depression Classification. Proc. Interspeech 2019, 3023-3027, DOI: 10.21437/Interspeech.2019-3095.


@inproceedings{Rutowski2019,
  author={Tomasz Rutowski and Amir Harati and Yang Lu and Elizabeth Shriberg},
  title={{Optimizing Speech-Input Length for Speaker-Independent Depression Classification}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3023--3027},
  doi={10.21437/Interspeech.2019-3095},
  url={http://dx.doi.org/10.21437/Interspeech.2019-3095}
}