Sixth European Conference on Speech Communication and Technology
In this paper we report recent developments on the meeting transcription task, a large vocabulary conversational speech recognition task. Previous experiments showed this is a very challenging task, with about 50% word error rate (WER) using existing recognizers. The difficulty mostly comes from highly disfluent/conversational nature of meetings, and lack of domain specific training data. For the first problem, our SWB(Switchboard) system — a conversational telephone speech recognizer —was used to recognize wide-band meeting data; for the latter, we leveraged the large amount of Broadcast News (BN) data to build a robust system. This paper will especially focus on two experiments in the BN system development: model combination and HMM topology/duration model-ing. Model combination can be done at various stages of recognition: post-processing schemes such as ROVER can lead to significant improvements; to reduce computation we tried model combination at acoustic score level. We will also show the importance of temporal constraints in decoding, present some HMM topology/duration modeling experiments. Finally, the meeting browser system and meeting room setup will be reviewed.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Yu, Hua / Finke, Michael / Waibel, Alex (1999): "Progress in automatic meeting transcription", In EUROSPEECH'99, 695-698.