Sixth European Conference on Speech Communication and Technology
Transcription of broadcast news shows (radio and television) is a major step in developing automatic tools for indexation and retrieval of the vast amounts of information generated on a daily basis. Broadcast shows are challenging to transcribe as they con-sist of a continuous data stream with segments of different lin-guistic and acoustic natures. Transcribing such data requires ad-dressing two main problems: those related to the varied acoustic properties of the signal, and those related to the linguistic prop-erties of the speech. Prior to word transcription, the data is par-titioned into homogeneous acoustic segments. Non-speech seg-ments are identified and rejected, and the speech segments are clustered and labeled according to bandwidth and gender. The speaker-independent large vocabulary, continuous speech recog-nizer makes use of n-gram statistics for language modeling and of continuous density HMMs with Gaussian mixtures for acoustic modeling. The LIMSI system has consistently obtained top-level performance in DARPA evaluations, with an overall word tran-scription error on the Nov98 evaluation test data of 13.6%. The average word error on unrestricted American English broadcast news data is under 20%.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Gauvain, Jean-Luc / Lamel, Lori / Adda, Gilles / Jardino, Michéle (1999): "Recent advances in transcribing television and radio broadcasts", In EUROSPEECH'99, 655-658.