A recent trend in ultra low bit-rate speech coding is based on segment quantization by unit-selection principle using large continuous codebooks as a unit database. We show that use of such large unit databases allows speech to be reconstructed at the decoder by using the best unitís residual itself (in the unit database), thereby obviating the need to transmit any side information about the residual of the input speech. For this, it becomes necessary to jointly quantize the spectral and residual information at the encoder during unit selection, and we propose various composite measures for such a joint spectral-residual quantization within a unit-selection algorithm proposed earlier. We realize ultra low bit-rate speaker-dependent speech coding at an overall rate of 250 bits/sec using unit database sizes of 19 bits/unit (524288 phonelike units or about 6 hours of speech) with spectral distortions less than 2.5 dB that retains intelligibility, naturalness, prosody and speaker-identity.
Bibliographic reference. Ramasubramanian, V. / Harish, D. (2009): "Ultra low bit-rate speech coding based on unit-selection with joint spectral-residual quantization: no transmission of any residual information", In INTERSPEECH-2009, 2615-2618.