 |
2003 ISCA Workshop on
Multilingual Spoken Document Retrieval
(MSDR2003)
Hong Kong
April 4-5, 2003 |
 |
Tracking Topics in Broadcast News Data
Yuen-Yee Lo, Jean-Luc Gauvain
Spoken Language Processing Group,
LIMSI-CNRS, Orsay, France
This paper describes a topic tracking system and its ability to
cope with sparse training data for broadcast news tracking. The
baseline tracker which relies on a unigram topic model. In order
to compensate for the very small amount of training data for each
topic, document expansion is used in estimating the initial topic
model, and unsupervisedmodel adaptation is carried out after processing
each test story. A new technique of variable weight unsupervised
online adaptation has been developed and was found to
outperform traditional fixed weight online adaptation. Combining
both document expansion and adaptation resulted in a 37% cost reduction
tested on both English and machine translated Mandarin
broadcast news data transcribed by an ASR system, with manual
story boundaries. Another challenging condition is one in which
the story boundaries are not known for the broadcast news data.
A window-based automatic story boundary detector has been developed
for the tracking system. The tracking results with the
window-based tracking system are comparable to those obtained
with a state-of-the-art automatic story segmentation on the TDT3
corpus.
Full Paper
Bibliographic reference.
Lo, Yuen-Yee / Gauvain, Jean-Luc (2003):
"Tracking topics in broadcast news data",
In MSDR-2003, 43-48.