ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Online speaker adaptation and tracking for real-time speech recognition

Daben Liu, Daniel Kiecza, Amit Srivastava, Francis Kubala

This paper describes a low-latency online speaker adaptation framework. The main objective is to apply fast speaker adaptation to a real-time (RT) large vocabulary continuous speech recognition (LVCSR) engine. In this framework, speaker adaptation is performed on speaker turns generated by online speaker change detection and speaker clustering. To maximize long-term system performance, the adaptation statistics for known speakers are updated continuously while new speakers are actively discovered. In contrast to existing approaches, a re-decode of an utterance after adaptation is eliminated from the process. We demonstrate that the framework can be easily incorporated into every pass of a multi-pass decoder. We applied the framework to the BBN Audio Indexer which is a real-time end-to-end audio indexing system that runs at around 0.6xRT. The result is an 8%-12% relative word-error-rate reduction on broadcast news benchmark tests for English, Chinese, and Arabic, with less than 0.1xRT cost in real-time throughput.

doi: 10.21437/Interspeech.2005-158

Cite as: Liu, D., Kiecza, D., Srivastava, A., Kubala, F. (2005) Online speaker adaptation and tracking for real-time speech recognition. Proc. Interspeech 2005, 281-284, doi: 10.21437/Interspeech.2005-158

  author={Daben Liu and Daniel Kiecza and Amit Srivastava and Francis Kubala},
  title={{Online speaker adaptation and tracking for real-time speech recognition}},
  booktitle={Proc. Interspeech 2005},