![]() |
7th International Conference on Spoken Language ProcessingSeptember 16-20, 2002 |
![]() |
Automatic recognition of compressed speech in such applications as voice mail or call centers has significantly degraded performance compared to non-compressed data when background noise is present. Recognition of transmitted speech, such as in cellular, voice over IP, or networked PDA input, may also face the problem of frame erasures. There have been various attempts to compensate for these two distortions using receiver-based techniques, but room for improvement may be limited. Since the demand for recognition of coded and transmitted speech is expected to increase significantly in the near future, it is of interest to determine what modifications can be made on the encoder/transmitter side. In this paper we explore issues in designing a speech coder aimed at improving recognition performance over a packet-lossy channel with minimal degradation in perceptual quality. We propose a multiple description version of a speech coder to alleviate distortions caused by frame erasures. We also propose a coder variation that uses mel-cepstral coefficients instead of linear prediction parameters as spectral specifier, allowing better recognition in noisy environments when access to the raw coder parameters is available at the receiver.
Bibliographic reference. Zhong, Xin / Arrowood, Jon A. / Clements, Mark A. (2002): "Speech coding and transmission for improved automatic recognition", In ICSLP-2002, 1845-1848.