Interspeech'2005 - Eurospeech
A novel multiple-input Kalman filtering (MIKF) framework is presented that estimates the clean speech signal by fusion of outputs from multiple speech enhancement systems. The MIKF framework generates a sample-by-sample minimum mean-square error estimate of the clean speech signal from these outputs. The residual noise in each input to the MIKF is modeled as an autoregressive (AR) process so that non-white noise can be accommodated, and the noise model is dynamically updated to handle non-stationary noise. Speech is also modeled as an AR process whose parameters are estimated from a codebook of suitably designed prototype AR parameters. Constraining the AR parameters via a codebook improves the quality and makes it easy to integrate the MIKF system with a speech coder. The proposed framework also has the flexibility to apply user-defined, heuristic weights to the inputs to the MIKF, which are the outputs of the contributing speech enhancement systems. Perceptual quality tests and objective measures (segmental signal-to-noise ratio) both demonstrate that the estimate of the clean speech signal generated by the MIKF is superior to any of its inputs.
Bibliographic reference. Krishnan, Venkatesh / Whitehead, Phil S. / Anderson, David V. / Clements, Mark A. (2005): "A framework for estimation of clean speech by fusion of outputs from multiple speech enhancement systems", In INTERSPEECH-2005, 2317-2320.