Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Comparison of Time & Frequency Filtering and Cepstral-Time Matrix Approaches in ASR

Dusan Macho (1,2), Climent Nadeu (1), Peter Jancovic (2), Gregor Rozinaj (2), Javier Hernando (1)

(1) TALP Research Center, TSC Dept., UPC, Barcelona, Spain
(2) Dept. of Telecommunications, STU, Bratislava, Slovakia

In current speech recognition systems, speech is represented by a 2-D sequence of parameters that model the temporal evolution of the spectral envelope of speech. Linear transformation or filtering along both time and frequency axes of that 2-D sequence are used to enhance the discriminative ability and robustness of speech parameters in the HMM pattern-matching formalism. In this paper, we compared two recently reported approaches which operate on the sequence of logarithmically compressed mel-scaled filter-bank energies: the first approach - TIFFING (TIme and Frequency FilterING) - applies FIR filters to that 2-D sequence along both axes, while the second one - CTM (Cepstral Time Matrix) - uses the DCT to compute a set of parameters in the 2-D transformed domain. They are compared in several ways: (1) analytically, using Fourier transformation, (2) statistically and (3) performing recognition tests with clean and noisy speech.

