In this paper, a novel algorithm that resembles amplitude demodulation in the frequency domain is introduced, and its application to automatic speech recognition (ASR) is studied. Speech production can be regarded as a result of amplitude modulation (AM) with the source (excitation) spectrum being the carrier and the vocal tract transfer function (VTTF) being the modulating signal. From this point of view, the VTTF can be recovered by amplitude demodulation. Amplitude demodulation of the speech spectrum is achieved by a novel nonlinear technique, which effectively performs envelope detection by using amplitudes of the harmonics and discarding inter-harmonic valleys. The technique is noise robust since frequency bands of low energy are discarded. The same principle is used to reshape the detected envelope. The algorithm is then used to construct an ASR feature extraction module. It is shown that this technique achieves superior performance to MFCCs in the presence of additive noise. Recognition accuracy is further improved if peak isolation is also performed.
Cite as: Zhu, Q., Alwan, A. (2000) AM-demodulation of speech spectra and its application io noise robust speech recognition. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, 341-344, doi: 10.21437/ICSLP.2000-85
@inproceedings{zhu00_icslp, author={Qifeng Zhu and Abeer Alwan}, title={{AM-demodulation of speech spectra and its application io noise robust speech recognition}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 1, 341-344}, doi={10.21437/ICSLP.2000-85} }