This paper presents an automatic speech recognition (ASR) system for assisting meeting record creation of the National Congress of Japan. The system is designed to cope with spontaneous characteristics of meeting speech, as well as a variety of topics and speakers. For acoustic model, minimum phone error (MPE) training is applied with several normalization techniques. For language model, we have proposed statistical style transformation to generate spoken-style N-grams and their statistics. We also introduce statistical modeling of pronunciation variation in spontaneous speech. The ASR system was evaluated on real congressional meetings, and achieved word accuracy of 84%. It is also suggested that the ASR-based transcripts with this accuracy level is usable for editing meeting records.
Bibliographic reference. Akita, Yuya / Mimura, Masato / Kawahara, Tatsuya (2009): "Automatic transcription system for meetings of the Japanese national congress", In INTERSPEECH-2009, 84-87.