ISCA Archive Eurospeech 1999
ISCA Archive Eurospeech 1999

Comparison of time & frequency filtering and cepstral-time matrix approaches in ASR

Dusan Macho, Climent Nadeu, Peter Jancovic, Gregor Rozinaj, Javier Hernando

In current speech recognition systems, speech is represented by a 2-D sequence of parameters that model the temporal evolution of the spectral envelope of speech. Linear transformation or filtering along both time and frequency axes of that 2-D sequence are used to enhance the discriminative ability and robustness of speech parameters in the HMM pattern-matching formalism. In this paper, we compared two recently reported approaches which operate on the sequence of logarithmically compressed mel-scaled filter-bank energies: the first approach - TIFFING (TIme and Frequency FilterING) - applies FIR filters to that 2-D sequence along both axes, while the second one - CTM (Cepstral Time Matrix) - uses the DCT to compute a set of parameters in the 2-D transformed domain. They are compared in several ways: (1) analytically, using Fourier transformation, (2) statistically and (3) performing recognition tests with clean and noisy speech.


doi: 10.21437/Eurospeech.1999-23

Cite as: Macho, D., Nadeu, C., Jancovic, P., Rozinaj, G., Hernando, J. (1999) Comparison of time & frequency filtering and cepstral-time matrix approaches in ASR. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 77-80, doi: 10.21437/Eurospeech.1999-23

@inproceedings{macho99_eurospeech,
  author={Dusan Macho and Climent Nadeu and Peter Jancovic and Gregor Rozinaj and Javier Hernando},
  title={{Comparison of time & frequency filtering and cepstral-time matrix approaches in ASR}},
  year=1999,
  booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)},
  pages={77--80},
  doi={10.21437/Eurospeech.1999-23}
}