Interspeech'2005 - Eurospeech
Automatic speech recognizers perform poorly when training and test data are systematically different in terms of noise and channel characteristics. One manifestation of such differences is variations in the probability density functions (pdfs) between training and test features. Consequently, both automatic speech recognition and automatic speaker identification may be severely degraded. Previous attempts to minimize this problem include Cepstral Mean and Variance Normalization and transforming all speech features to a univariate Gaussian pdf. In this paper, we present a quantile based Cumulative Density Function (CDF) matching technique for data drawn from different distributions. This method can be used to compensate for the systematic marginal (i.e. each feature individually) differences between training and test features. We further propose a linear covariance normalization technique to compensate for differences in covariance properties between training and test data. Experimental results are given that illustrate these techniques for speech recognition and automatic speaker identification.
Bibliographic reference. Prasad, Saurabh / Zahorian, Stephen A. (2005): "Nonlinear and linear transformations of speech features to compensate for channel and noise effects", In INTERSPEECH-2005, 969-972.