A model for detecting anxiety and depression from telephony recordings between a customer and a representative at a call center using vocal features and a deep neural network. Our binary classification model using x-vectors outperformed the use of the other acoustic features such as ivectors and openSMILE features, as well as linguistic or textbased features. Our models were built based on self-reported scores: GAD-7 for anxiety and PHQ-8 for depression. Especially, the anxiety model’s performance is very similar to the GAD-7 score’s screening accuracy. A prior study compared self-reported GAD-7 scores to an actual mental health professional’s diagnosis of anxiety disorder and reported sensitivity and specificity of 0.74 and 0.54 respectively, and our model showed a sensitivity of 0.70 and a specificity of 0.54. This study exhibits the potential of voice analysis on topic-independent speech, particularly from 8 kHz phone conversations, to identify anxiety and depression.
Cite as: Kwon, N., Hossain, S., Blaylock, N., O’Connell, H., Hachen, N., Gwin, J. (2022) Detecting Anxiety and Depression from Phone Conversations using x-vectors. Proc. Workshop on Speech, Music and Mind, 1-5, doi: 10.21437/SMM.2022-1
@inproceedings{kwon22_smm, author={Namhee Kwon and Shahruk Hossain and Nate Blaylock and Henry O’Connell and Naomi Hachen and Joseph Gwin}, title={{Detecting Anxiety and Depression from Phone Conversations using x-vectors}}, year=2022, booktitle={Proc. Workshop on Speech, Music and Mind}, pages={1--5}, doi={10.21437/SMM.2022-1} }