This paper examines sources of variability in the speech signal using a new technique that is based on a nested spectral analysis of variance (SANOVA). By constructing an ANOVA in the modulation spectral domain, the technique allows a characterization of unwanted variability in the time sequences of logarithmic energy caused by extraneuous sources of variability such as additive noise, convolutional noise, and telephone handset transducer. Very low and moderate to high modulation frequencies are shown to be particularly affected by these sources. Verification results for 500 speakers on Switchboard data from the 1998 NIST speaker recognition evaluation confirms the conclusions. It is shown that a bandpass filtering and down sampling of the time sequences of logarithmic energy, compared to a conventional highpass filtering, leads to a 13% relative reduction of the EER in mismatched conditions.
Cite as: Vuuren, S.v., Hermansky, H. (1999) Speech variability in the modulation spectral domain - SANOVA technique -. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 2195-2198, doi: 10.21437/Eurospeech.1999-486
@inproceedings{vuuren99_eurospeech, author={Sarel van Vuuren and Hynek Hermansky}, title={{Speech variability in the modulation spectral domain - SANOVA technique -}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={2195--2198}, doi={10.21437/Eurospeech.1999-486} }