Using Fisher Vector and Bag-of-Audio-Words Representations to Identify Styrian Dialects, Sleepiness, Baby & Orca Sounds

Gábor Gosztolya


The 2019 INTERSPEECH Computational Paralinguistics Challenge (ComParE) consists of four Sub-Challenges, where the tasks are to identify different German (Austrian) dialects, estimate how sleepy the speaker is, what type of sound a given baby uttered, and whether there is a sound of an orca (killer whale) present in the recording. Following our team’s last year entry, we continue our research by looking for feature set types that might be employed on a wide variety of tasks without alteration. This year, besides the standard 6373-sized ComParE functionals, we experimented with the Fisher vector representation along with the Bag-of-Audio-Words technique. To adapt Fisher vectors from the field of image processing, we utilized them on standard MFCC features instead of the originally intended SIFT attributes (which describe local objects found in the image). Our results indicate that using these feature representation techniques was indeed beneficial, as we could outperform the baseline values in three of the four Sub-Challenges; the performance of our approach seems to be even higher if we consider that the baseline scores were obtained by combining different methods as well.


 DOI: 10.21437/Interspeech.2019-1726

Cite as: Gosztolya, G. (2019) Using Fisher Vector and Bag-of-Audio-Words Representations to Identify Styrian Dialects, Sleepiness, Baby & Orca Sounds. Proc. Interspeech 2019, 2413-2417, DOI: 10.21437/Interspeech.2019-1726.


@inproceedings{Gosztolya2019,
  author={Gábor Gosztolya},
  title={{Using Fisher Vector and Bag-of-Audio-Words Representations to Identify Styrian Dialects, Sleepiness, Baby & Orca Sounds}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2413--2417},
  doi={10.21437/Interspeech.2019-1726},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1726}
}