11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Non-Linear Predictive Vector Quantization of Feature Vectors for Distributed Speech Recognition

Jose Enrique Garcia, Alfonso Ortega, Antonio Miguel, Eduardo Lleida

Universidad de Zaragoza, Spain

In this paper, we present a non linear prediction scheme based on a Multi-Layer Perceptron for Predictive Vector Quantization (PVQ-MLP) of MFCC for very low bit-rate coding of acoustic features in distributed speech recognition (DSR). Certain applications like voice enabled web-browsing or speech controlled processes in large industrial plants, where hundreds of users access simultaneously to the same ASR server can benefit from this substantial bit-rate reduction. Experimental results obtained on a large vocabulary task show an improved performance of PVQ-MLP in terms of prediction gain and WER compared to a linear prediction scheme, especially when low bit-rates are evaluated. Using PVQ-MLP the bit-rate can be reduced up to 1.8 kbps resulting in a reduction of 66% with respect to the ETSI standards (4.4 kbps) with a WER degradation lower than 5% compared to a system without quantization.

Full Paper

Bibliographic reference.  Garcia, Jose Enrique / Ortega, Alfonso / Miguel, Antonio / Lleida, Eduardo (2010): "Non-linear predictive vector quantization of feature vectors for distributed speech recognition", In INTERSPEECH-2010, 2378-2381.