Interspeech'2005 - Eurospeech
In generic automatic speech recognition (ASR) systems, typically, language models (LMs) are trained to work within a broad range of input conditions. ASR systems used in domain-specific spoken dialogue systems (SDSs) are more constrained in terms of content and style. A mismatch in content and/or style between training and operating conditions results in performance degradation for the dialogue application. The main focus of this paper is to develop tools to facilitate rapid development of spoken dialogue applications within the context of language model training by focusing on the problem of automatically collecting text data that is useful to train accurate language models for the new target domain without manually collecting any in-domain data. We investigate a framework to extract useful information from previous domains and World Wide Web (WWW). We collect data by submitting queries to a search engine and then clean the resulting text via syntactic and semantic filtering. This is followed by artificial sentence generation. Without using any in-domain data, our system achieved a word error rate (WER) of 19.33%, a performance comparable to that achieved by a language model trained on manually collected 32K in-domain sentences. Using less than 1% of in-domain data along with the automatically generated text, our system achieved an ASR performance close to a language model trained on 60K in-domain sentences.
Bibliographic reference. Akbacak, Murat / Gao, Yuqing / Gu, Liang / Kuo, Hong-Kwang Jeff (2005): "Rapid transition to new spoken dialogue domains: language model training using knowledge from previous domain applications and web text resources", In INTERSPEECH-2005, 1873-1876.