ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Discriminative graph training for ultra-fast low-footprint speech indexing

Upendra Chaudhari, Hong-Kwang Jeff Kuo, Brian Kingsbury

We study low complexity models for audio search. The indexing and retrieval system consists of Automatic Speech Recognition (ASR), phone expansion, N-gram indexing and approximate match. In particular, the ASR system can vary tremendously in complexity ranging from a simple speaker-independent system to a fully speaker-adapted system. In this paper, we focus on a speakerindependent system with a small number of Gaussians. Such a system, with ASR followed by phone expansion, provides a good balance between speed and accuracy, allowing for the processing of large volumes of data and better retrieval performance than systems relying solely on phone recognition. Here we describe the use of discriminative training of a finite-state decoding graph for improving system accuracy while preserving speed of operation.

doi: 10.21437/Interspeech.2008-569

Cite as: Chaudhari, U., Kuo, H.-K.J., Kingsbury, B. (2008) Discriminative graph training for ultra-fast low-footprint speech indexing. Proc. Interspeech 2008, 2175-2178, doi: 10.21437/Interspeech.2008-569

  author={Upendra Chaudhari and Hong-Kwang Jeff Kuo and Brian Kingsbury},
  title={{Discriminative graph training for ultra-fast low-footprint speech indexing}},
  booktitle={Proc. Interspeech 2008},