ISCA Archive SPECOM 2004
ISCA Archive SPECOM 2004

The influence of audio compression on speech recognition systems

Paulo Sirum Ng, Ivandro Sanches

Large amount of disk space is needed to store the increasing volume of speech data that is becoming available for most languages either by data logging in the application side or by speech data acquisition for large databases. One way to minimize the need for disk space is to compress the speech data by using modern perceptual audio coding techniques such as MPEG Layer-3 (as known as MP3) or Dolby AC-3. In this article the performance of a speech recognizer for Brazilian Portuguese using acoustic models trained with audio data coded with an MP3 codec at 4 different bitrates: 16 kbps, 24 kbps, 32 kbps and 64 kbps is evaluated, in order to assess the influence of this audio compression technique applied to speech data. The word accuracy rates are compared with those obtained when training acoustic models with the original PCM recordings at 16 kHz and 16 bits per sample. Our experiments indicate that for 32 kbps or higher audio bit rate the word accuracy rates show very little degradation in performance in certain contexts, which means up to 8 times less disk space needed.


Cite as: Sirum Ng, P., Sanches, I. (2004) The influence of audio compression on speech recognition systems. Proc. 9th Conference on Speech and Computer (SPECOM 2004), 128-131

@inproceedings{sirumng04_specom,
  author={Paulo {Sirum Ng} and Ivandro Sanches},
  title={{The influence of audio compression on speech recognition systems}},
  year=2004,
  booktitle={Proc. 9th Conference on Speech and Computer (SPECOM 2004)},
  pages={128--131}
}