Sixth European Conference on Speech Communication and Technology
(EUROSPEECH'99)

Budapest, Hungary
September 5-9, 1999

Optimization Algorithms for Estimating Modulation Spectrum Domain Filters

Pau Paches-Leal (1), Richard C. Rose (1), Climent Nadeu (2)

(1) AT&T Labs-Research, Florham Park, NJ, USA
(2) Univ. Politecnica de Catalunya, Barcelona, Spain

The goal of the work described in this paper is to develop and evaluate procedures for automatic estimation of modulation spectrum filters to compensate for distortions in the modulation spectrum domain. The modulation spectrum (MS) is often used to describe the time sequence of spectral parameters (TSSPs) that are derived from the speech waveform, and is thought to be a good representation of many sources of variability in speech. These procedures will be used in the context of automatic speech recognition (ASR) applications where there is likely to be a significant mismatch in the MS characteristics that exist for system training and evaluation. Results are presented describing application of the algorithm to one task involving an artificially introduced MS distortion and to another task involving differences in speaking styles for training and testing. It is shown in the paper that these techniques are able to compensate for the effects of artificially introduced distortions that appear in testing. It is also shown that a small degree of compensation is obtained for speaking style mismatch, and this result is compared with the measured effects of the speaking style differences in the MS domain. An algorithm is presented for automatic estimation of the An algorithm to estimate automatically filters in the modulation spectrum domain. These are used to compensate for distortions in this domain or to obtain the difference coefficients that are a part of the acoustic vector handed over to the HMM-based speech model. The mathematical properties of the new algorithm are analyzed. Its performance is studied in two different experiments: in the first the goal is to alleviate an artificially simulated distortion while in the second we try to compensate for speaking rate distortions in a database which has two distinct parts differing significantly in speaking rate.


Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Paches-Leal, Pau / Rose, Richard C. / Nadeu, Climent (1999): "Optimization algorithms for estimating modulation spectrum domain filters", In EUROSPEECH'99, 89-92.