7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Blind Normalization of Speech from Different Channels and Speakers

David N. Levin

University of Chicago, USA

This paper describes representations of time-dependent signals that are invariant under any invertible time-independent transformation of the signal time series. Such a representation is created by rescaling the signal in a non-linear dynamic manner that is determined by recently encountered signal levels. This technique may make it possible to normalize signals that are related by channel-dependent and speaker-dependent transformations, without having to characterize the form of the signal transformations, which remain unknown. The technique is illustrated by applying it to the time-dependent spectra of speech that has been filtered to simulate the effects of different channels. The experimental results show that the rescaled speech representations are largely normalized (i.e., channel-independent), despite the channel-dependence of the raw (unrescaled) speech.


Full Paper

Bibliographic reference.  Levin, David N. (2002): "Blind normalization of speech from different channels and speakers", In ICSLP-2002, 1425-1428.