INTERSPEECH 2007
8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Quality Assessment of Speech Enhancement Systems by Separation of Enhanced Speech, Noise, and Echo

Tim Fingscheidt, Suhadi Suhadi

Braunschweig Technical University, Germany

Quality assessment of speech enhancement systems is a nontrivial task, especially when (residual) noise and echo signal components occur. We present a signal separation scheme that allows for a detailed analysis of unknown speech enhancement systems in a black box test scenario. Our approach separates the speech, (residual) noise, and (residual) echo component of the speech enhancement system in the sending direction (uplink direction). This makes it possible to independently judge the speech degradation and the noise and echo attenuation/ degradation. While state of the art tests always try to judge the sending direction signal mixture, our new scheme allows a more reliable analysis in shorter time. It will be very useful for testing hands-free devices in practice as well as for testing speech enhancement algorithms in research and development.

Full Paper

Acoustic Material

S.WAV Near-end clean speech original (see Fig. 2).
N.WAV   Near-end acoustic background noise (see Fig. 2).
D.WAV   Echo signal as captured by the near-end microphone during the real-time test of the speech enhancement system (see Fig. 2).
Y.WAV   Input signal to the speech enhancement system in send direction: Y = S + D + N (see Fig. 2)
S_HAT.WAV   Output signal of the speech enhancement system in send direction, i.e., enhanced speech signal or estimate of the near-end speech signal (see Fig. 2).
S_TILDE.WAV   Separated near-end speech component of S_HAT, 1st output of our algorithm (compare to Fig. 1). Compare this signal to the input speech component S, and the output signal mixture S_HAT: Audibly it is felt being part of S_HAT, clearly it shows slight distortions vs. S due to the speech enhancement system.
N_TILDE.WAV   Separated residual noise component of S_HAT, 2nd output of our algorithm (compare to Fig. 1). Compare this signal to the input speech component N, and the output signal mixture S_HAT: Audibly it is felt being part of S_HAT, clearly audible are different suppression weights during speech activity and speech pause.
D_TILDE.WAV   Separated residual echo component of S_HAT, 3rd output of our algorithm (compare to Fig. 1). Compare this signal to the input speech component D, and the output signal mixture S_HAT: Audibly it is felt being part of S_HAT. On the other hand one realizes, that the echo canceller did not really capture the echo path at the beginning, but after the speech pause the situation gets a bit better.

Bibliographic reference.  Fingscheidt, Tim / Suhadi, Suhadi (2007): "Quality assessment of speech enhancement systems by separation of enhanced speech, noise, and echo", In INTERSPEECH-2007, 818-821.