15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

An Initial Investigation of Long-Term Adaptation for Meeting Transcription

X. Chen (1), Mark J. F. Gales (1), Kate M. Knill (1), Catherine Breslin (1), Langzhou Chen (2), K. K. Chin (2), Vincent Wan (2)

(1) University of Cambridge, UK
(2) Toshiba Research Europe, UK

Meeting transcription is a very useful and challenging task. The majority of research to date has focused on individual meeting, or only a small group of meetings. In many practical deployments, multiple related meetings will take place over a long period of time. This paper describes an initial investigation of how this long-term data can be used to improve meeting transcription. A corpus of technical meetings, using a single microphone array, was collected over a two year period, yielding a total of 179 hours of meeting data. Baseline systems using deep neural network acoustic models, in both Tandem and Hybrid configurations, and neural network-based language models are described. The impact of supervised and unsupervised adaptation of the acoustic models is then evaluated, as well as the impact of improved language models.

Full Paper

Bibliographic reference.  Chen, X. / Gales, Mark J. F. / Knill, Kate M. / Breslin, Catherine / Chen, Langzhou / Chin, K. K. / Wan, Vincent (2014): "An initial investigation of long-term adaptation for meeting transcription", In INTERSPEECH-2014, 954-958.