8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Use of Metadata to Improve Recognition of Spontaneous Speech and Named Entities

Bhuvana Ramabhadran, Olivier Siohan, Geoffrey Zweig

IBM T.J. Watson Research Center, USA

With improved recognition accuracies for LVCSR tasks, it has become possible to search large collections of spontaneous speech for a variety of information. The MALACH corpus of Holocaust testimonials is one such collection, in which we are interested in automatically transcribing and retrieving portions that are relevant to named entities such as people, places, and organizations. Since the testimonials were gathered from thousands of people in countries throughout Europe, an extremely large number of potential named entities are possible, and this causes a well-known dilemma: increasing the size of the vocabulary allows for more of these words to be recognized, but also increases confusability, and can harm recognition performance. However, the MALACH corpus, like many other collections, includes side information or metadata that can be exploited to provide prior information on exactly which named entities are likely to appear. This paper proposes a method that capitalizes on this prior information to reduce named-entity recognition errors by over 50% relative, and simultaneously decrease the overall word error rate by 7% relative. The metadata we use derives from a pre-interview questionaire that includes the names of friends, relatives, places visited, membership of organizations, synonyms of place names, and similar information. By augmenting the lexicon and language model with this information on a speakerby- speaker basis, we are able to exploit the textual information that is already available in the corpus to facilitate much improved speech recognition.

Full Paper

Bibliographic reference.  Ramabhadran, Bhuvana / Siohan, Olivier / Zweig, Geoffrey (2004): "Use of metadata to improve recognition of spontaneous speech and named entities", In INTERSPEECH-2004, 381-384.