A sound classification algorithm is presented which estimates the signal-to-noise ratio between speech and noise in 15 different frequency channels. The algorithm bases on the extraction of spectro-temporal features from the acoustical waveform. The approach is motivated by neurophysiological findings on periodicity coding in the auditory system of mammals. The extracted feature patterns are called Amplitude Modulation Spectrograms (AMS), as each AMS pattern contains information on both center frequencies and amplitude modulations in a short segment (32ms) of the input signal. An artificial neural network is trained on a large set of AMS patterns from mixtures of speech and noise and is then used to predict the narrow-band signal-to-noise ratio of unknown sounds.
Cite as: Tchorz, J., Kollmeier, B. (1999) Speech detection and SNR prediction basing on amplitude modulation pattern recognition. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 2399-2402, doi: 10.21437/Eurospeech.1999-526
@inproceedings{tchorz99_eurospeech, author={Jürgen Tchorz and Birger Kollmeier}, title={{Speech detection and SNR prediction basing on amplitude modulation pattern recognition}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={2399--2402}, doi={10.21437/Eurospeech.1999-526} }