14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

KPCatcher — A Keyphrase Extraction System for Enterprise Videos

Yongxin Taylor Xi, Matthias Paulik, Venkata Ramana Gadde, Ananth Sankar

Cisco Systems, USA

This paper introduces KPCatcher (keyphrase catcher). The value of our work lies in providing concrete solutions to building a real keyphrase extraction product for enterprise videos. KPCatcher has been designed to robustly extract a ranked list of keyphrases from enterprise videos, independent of the domain. It treats noun phrases in the transcript as candidate keyphrases and scores them by aggregating word-level scores. By using confidence-based and counting-based rules, KPCatcher handles transcription errors to prevent incorrect keyphrases to be surfaced to end users. Different from previous work, we focus our experiments on automatic transcriptions of real enterprise videos from various domains. We thoroughly evaluate several well-known keyword ranking features and the denoising rules, using enterprise videos from several domains at various word error rates. We find term frequency to be the best feature and show that our denoising rules are very effective in both rejecting incorrect keyphrases and increasing the overlap between top keyphrases and human provided keyphrases. We also show that KPCatcher compares favorably to existing research systems on ICSI meeting data.

Full Paper

Bibliographic reference.  Xi, Yongxin Taylor / Paulik, Matthias / Gadde, Venkata Ramana / Sankar, Ananth (2013): "KPCatcher — a keyphrase extraction system for enterprise videos", In INTERSPEECH-2013, 1906-1910.