Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

A Framework for Estimation of Clean Speech by Fusion of Outputs from Multiple Speech Enhancement Systems

Venkatesh Krishnan, Phil S. Whitehead, David V. Anderson, Mark A. Clements

Georgia Institute of Technology, Atlanta, GA, USA

A novel multiple-input Kalman filtering (MIKF) framework is presented that estimates the clean speech signal by fusion of outputs from multiple speech enhancement systems. The MIKF framework generates a sample-by-sample minimum mean-square error estimate of the clean speech signal from these outputs. The residual noise in each input to the MIKF is modeled as an autoregressive (AR) process so that non-white noise can be accommodated, and the noise model is dynamically updated to handle non-stationary noise. Speech is also modeled as an AR process whose parameters are estimated from a codebook of suitably designed prototype AR parameters. Constraining the AR parameters via a codebook improves the quality and makes it easy to integrate the MIKF system with a speech coder. The proposed framework also has the flexibility to apply user-defined, heuristic weights to the inputs to the MIKF, which are the outputs of the contributing speech enhancement systems. Perceptual quality tests and objective measures (segmental signal-to-noise ratio) both demonstrate that the estimate of the clean speech signal generated by the MIKF is superior to any of its inputs.

Full Paper

Bibliographic reference.  Krishnan, Venkatesh / Whitehead, Phil S. / Anderson, David V. / Clements, Mark A. (2005): "A framework for estimation of clean speech by fusion of outputs from multiple speech enhancement systems", In INTERSPEECH-2005, 2317-2320.