12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Attention, Sobriety Checkpoint! Can Humans Determine by Means of Voice, if Someone is Drunk… and Can Automatic Classifiers Compete?

Stefan Ultes, Alexander Schmitt, Wolfgang Minker

Universität Ulm, Germany

This paper analyzes the human performance of recognizing drunk speakers merely by voice and compares the results with the performance of an automatic statistical classifier. The study is carried out within the Interspeech 2011 Speaker State Challenge [1] employing the Alcohol Language Corpus (ALC) [2]. The 79 subjects yielded an average performance of 55.8% unweighted accuracy on a balanced intoxicated/non-intoxicated sample set. The statistical classifier developed in this study reaches a performance of 66.6% unweighted accuracy on the test set. In comparison, the subject with the highest performance yielded 70.0%. Our classifier is based on 4368 acoustic and prosodic features. Incorporating linguistic features along with feature selection using Information Gain Ratio (IGR) ranking added 0.7% absolute improvement with resulting in a 29% smaller feature space size.


  1. B. Schuller, S. Steidl, A. Batliner, F. Schiel, and J. Krajewski, “The interspeech 2011 speaker state challenge,” in Proc. of the International Conference on Speech and Language Processing (ICSLP), Aug. 2011.
  2. S. B. Florian Schiel, Christian Heinrich and T. Gilg, “Alc: Alcohol language corpus,” in Proc. of LREC. Marrakech, Morocco: European Language Resources Association (ELRA), may 2008.

Full Paper

Bibliographic reference.  Ultes, Stefan / Schmitt, Alexander / Minker, Wolfgang (2011): "Attention, sobriety checkpoint! can humans determine by means of voice, if someone is drunk… and can automatic classifiers compete?", In INTERSPEECH-2011, 3221-3224.