We consider the problem of automatically detecting if a speaker is suffering from common cold from his/her speech. When a speaker has symptoms of cold, his/her voice quality changes compared to the normal one. We hypothesize that such a change in voice quality could be reflected in lower likelihoods from a model built using normal speech. In order to capture this, we compute a 120-dimensional posteriorgram feature in each frame using Gaussian mixture model from 120 states of 40 three-states phonetic hidden Markov models trained on approximately 16.4 hours of normal English speech. Finally, a fixed 5160-dimensional phoneme state posteriorgram (PSP) feature vector for each utterance is obtained by computing statistics from the posteriorgram feature trajectory. Experiments on the 2017-Cold sub-challenge data show that when the decisions from bag-of-audio-words (BoAW) and end-to-end (e2e) are combined with those from PSP features with unweighted majority rule, the UAR on the development set becomes 69% which is 2.9% (absolute) better than the best of the UARs obtained by the baseline schemes. When the decisions from ComParE, BoAW and PSP features are combined with simple majority rule, it results in a UAR of 68.52% on the test set.
Cite as: Suresh, A.K., K.M., S.R., Ghosh, P.K. (2017) Phoneme State Posteriorgram Features for Speech Based Automatic Classification of Speakers in Cold and Healthy Condition. Proc. Interspeech 2017, 3462-3466, doi: 10.21437/Interspeech.2017-1550
@inproceedings{suresh17_interspeech, author={Akshay Kalkunte Suresh and Srinivasa Raghavan K.M. and Prasanta Kumar Ghosh}, title={{Phoneme State Posteriorgram Features for Speech Based Automatic Classification of Speakers in Cold and Healthy Condition}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={3462--3466}, doi={10.21437/Interspeech.2017-1550} }