8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Efficient Sub-optimal Temporal Decomposition with Dynamic Weighting of Speech Signals for Coding Applications

Malah David, Slava Shectman

Technion IIT, Israel

The Optimized Temporal Decomposition (OTD) technique for Line Spectral Frequencies (LSF) speech envelope representation, under a MMSE criterion, has been shown to be promising for very low bit rate speech coding for storage and broadcast applications. In order to improve perceptual speech quality, a dynamically weighted OTD (DW-OTD) technique is introduced in this work. It extends the OTD by allowing temporally changing weights, so as to improve the perceived speech quality. Use of Gardner's weighted MSE with DWOTD is found to reduce the Log Spectral Distance (LSD) measure by 0.3 dB, as compared to OTD. The original OTD algorithm delay and complexity requirements make it inappropriate for real-time speech coding. In this paper we also introduce a modification of this technique, which is sub-optimal but suitable for on-line speech coding purposes, with negligible degradation of performance (of only about 0.06 dB in LSD). With the proposed techniques we were able to encode speech spectral envelopes at 300-370 bps at LSD of 2.25- 2.1 dB, respectively, with a delay of just 7 frames.

Full Paper

Bibliographic reference.  David, Malah / Shectman, Slava (2004): "Efficient sub-optimal temporal decomposition with dynamic weighting of speech signals for coding applications", In INTERSPEECH-2004, 2001-2004.