EUROSPEECH 2003 - INTERSPEECH 2003
8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003

        

Towards Optimal Encoding for Classification with Applications to Distributed Speech Recognition

Naveen Srinivasamurthy, Antonio Ortega, Shrikanth Narayanan

University of Southern California, USA

In distributed classification applications, due to computational constraints, data acquired by low complexity clients is compressed and transmitted to a remote server for classification. In this paper the design of optimal quantization for distributed classification applications is considered and evaluated in the context of a speech recognition task. The proposed encoder minimizes the detrimental effect compression has on classification performance. Specifically, the proposed methods concentrate on designing low dimension encoders. Here individual encoders independently quantize sub-dimensions of a high dimension vector used for classification. The main novelty of the work is the introduction of mutual information as a metric for designing compression algorithms in classification applications. Given a rate constraint, the proposed algorithm minimizes the mutual information loss due to compression. Alternatively it ensures that the compressed data used for classification retains maximal information about the class labels. An iterative empirical algorithm (similar to the Lloyd algorithm) is provided to design quantizers for this new distortion measure. Additionally, mutual information is also used to propose a rate-allocation scheme where rates are allocated to the sub-dimensions of a vector (which are independently encoded) to satisfy a given rate constraint. The results obtained indicate that mutual information is a better metric (when compared to mean square error) for optimizing encoders used in distributed classification applications. In a distributed spoken names recognition task, the proposed mutual information based rate-allocation reduces by a factor of six the increase in WER due to compression when compared to a heuristic rate-allocation.

Full Paper

Bibliographic reference.  Srinivasamurthy, Naveen / Ortega, Antonio / Narayanan, Shrikanth (2003): "Towards optimal encoding for classification with applications to distributed speech recognition", In EUROSPEECH-2003, 1113-1116.