Update of acoustic and language models is vital to maintain performance of automatic speech recognition (ASR) systems. To alleviate efforts for updating models, we propose a "semi-automated" framework for the ASR system of the Japanese National Congress. The framework consists of our speaking-style transformation (SST) and lightly-supervised training (LSV) approaches, which can automatically generate spoken-style training texts and labels from documents like meeting minutes. An experimental evaluation demonstrated that this update framework improved the ASR performance for the latest meeting data. We also address an estimation method of the ASR accuracy based on SST, which uses minutes as reference texts and does not require verbatim transcripts.
Bibliographic reference. Akita, Yuya / Mimura, Masato / Neubig, Graham / Kawahara, Tatsuya (2010): "Semi-automated update of automatic transcription system for the Japanese national congress", In INTERSPEECH-2010, 338-341.