16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Uncertainty Decoding for DNN-HMM Hybrid Systems Based on Numerical Sampling

Christian Huemmer (1), Roland Maas (1), Andreas Schwarz (1), Ramón F. Astudillo (2), Walter Kellermann (1)

(1) FAU Erlangen-Nürnberg, Germany
(2) INESC-ID Lisboa, Portugal

In this article, we propose an uncertainty decoding scheme for DNN-HMM hybrid systems based on numerical sampling. A finite set of samples is drawn from the estimated probability distribution of the acoustic features and subsequently passed through feature transformations/extensions and the deep neural network (DNN). Then, the nonlinearly-transformed feature samples are averaged at the output of the DNN in order to approximate the posterior distribution of the context-dependent Hidden Markov Model (HMM) states. This concept is experimentally verified for the REVERB challenge task using a reverberation-robust DNN-HMM hybrid system: The numerical sampling is performed in the logmelspec domain, where we estimate the posterior distribution of the acoustic features by combining coherence-based Wiener filtering and uncertainty propagation. The experimental results highlight the good performance of the proposed uncertainty decoding scheme with significantly increased recognition accuracy even for a small number of feature samples.

Full Paper

Bibliographic reference.  Huemmer, Christian / Maas, Roland / Schwarz, Andreas / Astudillo, Ramón F. / Kellermann, Walter (2015): "Uncertainty decoding for DNN-HMM hybrid systems based on numerical sampling", In INTERSPEECH-2015, 3556-3560.