EUROSPEECH 2003 - INTERSPEECH 2003
This paper uses transform coding for compressing feature vectors in distributed speech recognition applications. Feature vectors are first grouped together into non-overlapping blocks and a transformation applied. A non-uniform allocation of bits to the elements of the resultant matrix is based on their relative information content. Analysis of the amplitude distribution of these elements indicates that non-linear quantisation is more appropriate than linear quantisation. Comparative results, based on speech recognition accuracy, confirm this. RASTA filtering is also considered as is shown to reduce the temporal variation of the feature vector stream. Recognition tests demonstrate that compression to bits rates of 2400bps, 1200bps and 800bps has very little effect on recognition accuracy for both clean and noisy speech. For example at a bit rate of 1200bps, recognition accuracy is 98.0% compared to 98.6% with no compression.
Bibliographic reference. Milner, Ben P. (2003): "Non-linear compression of feature vectors using transform coding and non-uniform bit allocation", In EUROSPEECH-2003, 2697-2700.