A novel multiple-input Kalman filtering (MIKF) framework is presented that estimates the clean speech signal by fusion of outputs from multiple speech enhancement systems. The MIKF framework generates a sample-by-sample minimum mean-square error estimate of the clean speech signal from these outputs. The residual noise in each input to the MIKF is modeled as an autoregressive (AR) process so that non-white noise can be accommodated, and the noise model is dynamically updated to handle non-stationary noise. Speech is also modeled as an AR process whose parameters are estimated from a codebook of suitably designed prototype AR parameters. Constraining the AR parameters via a codebook improves the quality and makes it easy to integrate the MIKF system with a speech coder. The proposed framework also has the flexibility to apply user-defined, heuristic weights to the inputs to the MIKF, which are the outputs of the contributing speech enhancement systems. Perceptual quality tests and objective measures (segmental signal-to-noise ratio) both demonstrate that the estimate of the clean speech signal generated by the MIKF is superior to any of its inputs.
Cite as: Krishnan, V., Whitehead, P.S., Anderson, D.V., Clements, M.A. (2005) A framework for estimation of clean speech by fusion of outputs from multiple speech enhancement systems. Proc. Interspeech 2005, 2317-2320, doi: 10.21437/Interspeech.2005-740
@inproceedings{krishnan05_interspeech, author={Venkatesh Krishnan and Phil S. Whitehead and David V. Anderson and Mark A. Clements}, title={{A framework for estimation of clean speech by fusion of outputs from multiple speech enhancement systems}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={2317--2320}, doi={10.21437/Interspeech.2005-740} }