Interspeech'2005 - Eurospeech
Human listeners not only perform better than machine recognizers at the same signal to noise ratio but are also able to deal with the problems presented by non-stationary noise better. In this paper we describe a series of experiments in which the human ability to select clean segments of speech from a noisy environment is emulated by a machine recogniser. We show that a vector quantiser that incorporates speaker specific information can be used to estimate the similarity between the input signal and speech vector in a codebook and so produce a probabilistic weighting for the distances used in pattern matching. The performance of the probabilistic system is slightly worse with stationary noise than a system using noise-matched models however it is better on non-stationary noise although the matched system has knowledge of the background noise conditions while the probabilistic system has no information about the noise.
Bibliographic reference. Carey, Michael J. / Quang, Tuan P. (2005): "A speech similarity distance weighting for robust recognition", In INTERSPEECH-2005, 1257-1260.