We hear phonemes pronounced by men, women and children as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that we extract and separate the size and shape information of sound sources. The impulse response of the vocal tract is compressed or expanded in time when the length of the vocal tract is compressed or expanded proportionally with the same cross-area function. The compressed and dilated versions of the impulse response can be converted into the same distribution using the Mellin transform. In this paper we show that the Mellin transform can be applied to the stabilised wavelet transform that forms the basis of the Auditory Image Model (AIM) of processing in the auditory pathway. The combined processing normalises source size information and produces a new, fruitful representation of source shape information, referred to as the "Mellin Image". This "Stabilised Wavelet-Mellin Transform" (SWMT) also provides the mathematical framework for the derivation of the gammachirp auditory filterbank and the signal synchronous analysis in AIM.
Cite as: Irino, T., Patterson, R.D. (1999) Stabilised wavelet mellin transform: an auditory strategy for normalising sound-source size. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 1899-1902, doi: 10.21437/Eurospeech.1999-416
@inproceedings{irino99_eurospeech, author={Toshio Irino and Roy D. Patterson}, title={{Stabilised wavelet mellin transform: an auditory strategy for normalising sound-source size}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={1899--1902}, doi={10.21437/Eurospeech.1999-416} }