ISCA Archive SPECOM 2004
ISCA Archive SPECOM 2004

Improving ASR performance on PDA by contamination of training data

Christophe Ris, Laurent Couvreur

Automatic Speech Recognition (ASR) on Personal Digital Assistant (PDA) suffers from the intrinsic hardware characteristics of the audio interface, for example, low quality microphones and device internal noises. In this paper, we propose to compensate for these weaknesses by contaminating clean training data with the distortion sources that are specific to the target device. We present a method to estimate both the frequency response of the audio acquisition channel and the internal additive noise from a few tens of minutes of recordings on PDA. The channel characteristics are estimated from the long term power spectra of clean speech and PDA recordings, while the noise power spectrum is estimated during silence segments in these recordings. All the recordings are performed in a controlled way, i.e. quiet environment and no reverberation, in order to ensure that we measure only the internal device characteristics. The PDA-specific training data are then obtained by filtering the clean training data with the audio channel frequency response and contaminating them with internal noise, and a specific acoustic model is eventually trained for the target device. Recognition tests have been performed on digit sequences on three different PDA’s. Our approach has been compared to other channel and noise robust methods and presents very competitive performance.


Cite as: Ris, C., Couvreur, L. (2004) Improving ASR performance on PDA by contamination of training data. Proc. 9th Conference on Speech and Computer (SPECOM 2004), 236-243

@inproceedings{ris04_specom,
  author={Christophe Ris and Laurent Couvreur},
  title={{Improving ASR performance on PDA by contamination of training data}},
  year=2004,
  booktitle={Proc. 9th Conference on Speech and Computer (SPECOM 2004)},
  pages={236--243}
}