Detecting Depression with Audio/Text Sequence Modeling of Interviews

Tuka Al Hanai, Mohammad Ghassemi, James Glass


Medical professionals diagnose depression by interpreting the responses of individuals to a variety of questions, probing lifestyle changes and ongoing thoughts. Like professionals, an effective automated agent must understand that responses to queries have varying prognostic value. In this study we demonstrate an automated depression-detection algorithm that models interviews between an individual and agent and learns from sequences of questions and answers without the need to perform explicit topic modeling of the content. We utilized data of 142 individuals undergoing depression screening and modeled the interactions with audio and text features in a Long-Short Term Memory (LSTM) neural network model to detect depression. Our results were comparable to methods that explicitly modeled the topics of the questions and answers which suggests that depression can be detected through sequential modeling of an interaction, with minimal information on the structure of the interview.


 DOI: 10.21437/Interspeech.2018-2522

Cite as: Al Hanai, T., Ghassemi, M., Glass, J. (2018) Detecting Depression with Audio/Text Sequence Modeling of Interviews. Proc. Interspeech 2018, 1716-1720, DOI: 10.21437/Interspeech.2018-2522.


@inproceedings{Al Hanai2018,
  author={Tuka {Al Hanai} and Mohammad Ghassemi and James Glass},
  title={Detecting Depression with Audio/Text Sequence Modeling of Interviews},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1716--1720},
  doi={10.21437/Interspeech.2018-2522},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2522}
}