12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Using Prosodic and Spectral Features in Detecting Depression in Elderly Males

Michelle Hewlett Sanchez (1), Dimitra Vergyri (1), Luciana Ferrer (1), Colleen Richey (1), Pablo Garcia (1), Bruce Knoth (1), William Jarrold (2)

(1) SRI International, USA
(2) University of California at Davis, USA

As research in speech processing has matured, there has been much interest in paralinguistic speech processing problems including the speaker's mental and psychological health. In this study, we focus on speech features that can identify the speaker's emotional health, i.e., whether the speaker is depressed or not. We use prosodic speech measurements, such as pitch and energy, in addition to spectral features, such as formants and spectral tilt, and compute statistics of these features over different regions of the speech signal. These statistics are used as input features to a discriminative classifier that predicts the speaker's depression state. We find that with an N-fold leave-one-out cross-validation setup, we can achieve a prediction accuracy of 81.3%, where random guess is 50%.

Full Paper

Bibliographic reference.  Sanchez, Michelle Hewlett / Vergyri, Dimitra / Ferrer, Luciana / Richey, Colleen / Garcia, Pablo / Knoth, Bruce / Jarrold, William (2011): "Using prosodic and spectral features in detecting depression in elderly males", In INTERSPEECH-2011, 3001-3004.