7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Speech Coding and Transmission for Improved Automatic Recognition

Xin Zhong, Jon A. Arrowood, Mark A. Clements

Georgia Institute of Technology, USA

Automatic recognition of compressed speech in such applications as voice mail or call centers has significantly degraded performance compared to non-compressed data when background noise is present. Recognition of transmitted speech, such as in cellular, voice over IP, or networked PDA input, may also face the problem of frame erasures. There have been various attempts to compensate for these two distortions using receiver-based techniques, but room for improvement may be limited. Since the demand for recognition of coded and transmitted speech is expected to increase significantly in the near future, it is of interest to determine what modifications can be made on the encoder/transmitter side. In this paper we explore issues in designing a speech coder aimed at improving recognition performance over a packet-lossy channel with minimal degradation in perceptual quality. We propose a multiple description version of a speech coder to alleviate distortions caused by frame erasures. We also propose a coder variation that uses mel-cepstral coefficients instead of linear prediction parameters as spectral specifier, allowing better recognition in noisy environments when access to the raw coder parameters is available at the receiver.


Full Paper

Bibliographic reference.  Zhong, Xin / Arrowood, Jon A. / Clements, Mark A. (2002): "Speech coding and transmission for improved automatic recognition", In ICSLP-2002, 1845-1848.