Interspeech'2005 - Eurospeech
This paper describes a low-latency online speaker adaptation framework. The main objective is to apply fast speaker adaptation to a real-time (RT) large vocabulary continuous speech recognition (LVCSR) engine. In this framework, speaker adaptation is performed on speaker turns generated by online speaker change detection and speaker clustering. To maximize long-term system performance, the adaptation statistics for known speakers are updated continuously while new speakers are actively discovered. In contrast to existing approaches, a re-decode of an utterance after adaptation is eliminated from the process. We demonstrate that the framework can be easily incorporated into every pass of a multi-pass decoder. We applied the framework to the BBN Audio Indexer which is a real-time end-to-end audio indexing system that runs at around 0.6xRT. The result is an 8%-12% relative word-error-rate reduction on broadcast news benchmark tests for English, Chinese, and Arabic, with less than 0.1xRT cost in real-time throughput.
Bibliographic reference. Liu, Daben / Kiecza, Daniel / Srivastava, Amit / Kubala, Francis (2005): "Online speaker adaptation and tracking for real-time speech recognition", In INTERSPEECH-2005, 281-284.