Sixth European Conference on Speech Communication and Technology
(EUROSPEECH'99)

Budapest, Hungary
September 5-9, 1999

Speech Variability in the Modulation Spectral Domain - SANOVA Technique -

Sarel van Vuuren (1,2), Hynek Hermansky (1,3)

(1) Oregon Graduate Institute of Science and Technology, Portland, OR, USA
(2) SpeechWorks International, Boston, MA, USA
(3) International Computer Science Institute, Berkeley, CA, USA

This paper examines sources of variability in the speech signal using a new technique that is based on a nested spectral analysis of variance (SANOVA). By constructing an ANOVA in the modulation spectral domain, the technique allows a characterization of unwanted variability in the time sequences of logarithmic energy caused by extraneuous sources of variability such as additive noise, convolutional noise, and telephone handset transducer. Very low and moderate to high modulation frequencies are shown to be particularly affected by these sources. Verification results for 500 speakers on Switchboard data from the 1998 NIST speaker recognition evaluation confirms the conclusions. It is shown that a bandpass filtering and down sampling of the time sequences of logarithmic energy, compared to a conventional highpass filtering, leads to a 13% relative reduction of the EER in mismatched conditions.


Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Vuuren, Sarel van / Hermansky, Hynek (1999): "Speech variability in the modulation spectral domain - SANOVA technique -", In EUROSPEECH'99, 2195-2198.