SAPA-SCALE Conference 2012

Portland, OR, USA
September 7-8, 2012

Joint Uncertainty Decoding with Unscented Transform for Noise Robust Subspace Gaussian Mixture Models

Liang Lu, Arnab Ghoshal, Steve Renals

Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK

Common noise compensation techniques use vector Taylor series (VTS) to approximate the mismatch function. Recent work shows that the approximation accuracy may be improved by sampling. One such sampling technique is the unscented transform (UT), which draws samples deterministically from clean speech and noise model to derive the noise corrupted speech parameters. This paper applies UT to noise compensation of the subspace Gaussian mixture model (SGMM). Since UT requires relatively smaller number of samples for accurate estimation, it has significantly lower computational cost compared to other random sampling techniques. However, the number of surface Gaussians in an SGMM is typically very large, making the direct application of UT, for compensating individual Gaussian components, computationally impractical. In this paper, we avoid the computational burden by employing UT in the framework of joint uncertainty decoding (JUD), which groups all the Gaussian components into small number of classes, sharing the compensation parameters by class. We evaluate the JUD-UT technique for an SGMM system using the Aurora 4 corpus. Experimental results indicate that UT can lead to increased accuracy compared to VTS approximation if the JUD phase factor is untuned, and to similar accuracy if the phase factor is tuned empirically.

Full Paper

Bibliographic reference.  Lu, Liang / Ghoshal, Arnab / Renals, Steve (2012): "Joint uncertainty decoding with unscented transform for noise robust subspace Gaussian mixture models", In SAPA-SCALE-2012, 40-45.