Calibrating DNN Posterior Probability Estimates of HMM/DNN Models to Improve Social Signal Detection from Audio Data

Gábor Gosztolya, László Tóth


To detect social signals such as laughter or filler events from audio data, a straightforward choice is to apply a Hidden Markov Model (HMM) in combination with a Deep Neural Network (DNN) that supplies the local class posterior estimates ( HMM/DNN hybrid model). However, the posterior estimates of the DNN may be suboptimal due to a mismatch between the cost function used during training (e.g. frame-level cross-entropy) and the actual evaluation metric (e.g. segment-level F1 score). In this study, we show experimentally that by employing a simple posterior probability calibration technique on the DNN outputs, the performance of the HMM/DNN workflow can be significantly improved. Specifically, we apply a linear transformation on the activations of the output layer right before using the softmax function, and fine-tune the parameters of this transformation. Out of the calibration approaches tested, we got the best F1 scores when the posterior calibration process was adjusted so as to maximize the actual HMM-based evaluation metric.


 DOI: 10.21437/Interspeech.2019-2552

Cite as: Gosztolya, G., Tóth, L. (2019) Calibrating DNN Posterior Probability Estimates of HMM/DNN Models to Improve Social Signal Detection from Audio Data. Proc. Interspeech 2019, 515-519, DOI: 10.21437/Interspeech.2019-2552.


@inproceedings{Gosztolya2019,
  author={Gábor Gosztolya and László Tóth},
  title={{Calibrating DNN Posterior Probability Estimates of HMM/DNN Models to Improve Social Signal Detection from Audio Data}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={515--519},
  doi={10.21437/Interspeech.2019-2552},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2552}
}