Accessing Information in Spoken Audio
April 19-20, 1999
Two important components of a speech archiving system are the compression scheme and the search facility. We investigate two ways of providing these components. The first is to run the recogniser directly from the compressed speech - we show how even with a 2.4kbit/sec codec it is possible to produce good recognition results; but the search is slow. The second is to preprocess the speech and store the extra data in a compressed form along with the speech. In the case of an RNN-HMM hybrid system, the posterior probabilties provide a suitable intermediate data format. Vector quantizing these at just 625 bits/sec enables the search to run many times real-time and still maintain good recognition accuracy.
Full Paper (PDF) Full Paper (Zipped Postscript)
Bibliographic reference. Tucker, Roger / Robinson, Tony / Christie, James / Seymour, Carl (1999): "Recognition-compatible speech compression for stored speech", In Access-Audio-1999, 69-72.