ISCA Archive ASR 2000
ISCA Archive ASR 2000

Reliable ASR based on unreliable features

I. Potamitis, Nikos Fakotakis, George Kokkinakis

The present paper reports on a novel technique that links the basic concepts of multi-band based Automatic Speech Recognition (ASR) and Missing Feature Theory (MFT). In the multi-band paradigm the frequency spectrum is partitioned in narrow bands and processed independently. In the context of MFT, the stochastic framework of continuous density Hidden Markov Models (HMMs) is adapted to handle time frequency regions corrupted by noise.

The present study alters the Mel Frequency Cepstrum Coefflcients (MFCC) front-end, by interposing an evaluation and enhancement stage of the spectrum's reliability between the filter-bank output and the Discrete Cosine Transform. Each filterbank output ís considered to be a time series and non-linear series prediction techniques are used to examine separately its reliabilityr. The key idea is to discern which feature in which bank is impaired in the current time frame and to use a properly selected Time Delay Neural Network (TDNN) from a pool of available networks, to predict and substitute the unreliable features based on the reliable ones and their history.

Cite as: Potamitis, I., Fakotakis, N., Kokkinakis, G. (2000) Reliable ASR based on unreliable features. Proc. ASR2000 - Automatic Speech Recognition: Challenges for the New Millenium, 53-57

  author={I. Potamitis and Nikos Fakotakis and George Kokkinakis},
  title={{Reliable ASR based on unreliable features}},
  booktitle={Proc. ASR2000 - Automatic Speech Recognition: Challenges for the New Millenium},