This paper presents an automatic speaker physical load recognition approach using posterior probability based features from acoustic and phonetic tokens. In this method, the tokens for calculating the posterior probability or zero-order statistics are extended from the conventional MFCC trained Gaussian Mixture Models (GMM) components to parallel phonetic phonemes and tandem feature trained GMM components. Phoneme recognizers from five different languages are employed to extract the phoneme posterior probabilities. We show that these histogram style features at both the acoustic and phonetic levels are effective and complementary for capturing the speaker physical load information from short utterances. Support vector machine is adopted as the supervised classifier. By combining the proposed methods with the OpenSMILE baseline which covers the acoustic and prosodic information further improves the final performance. The proposed fusion system achieves 70.18% and 72.81% unweighted accuracy on the validation and test set of the Munich Bio-voice Corpus for the binary physical load level recognition task in the INTERSPEECH 2014 Computational Paralinguistics Challenge.
Bibliographic reference. Li, Ming (2014): "Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens", In INTERSPEECH-2014, 437-441.