ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Phoneme-based topic spotting on the switchboard corpus

M. W. Theunissen, K. Scheffler, J. A. du Preez

The field of topic spotting in conversational speech deals with the problem of identifying "interesting" conversations or speech extracts amongst large volumes of speech data. In this research, two phoneme-based topic spotting systems were evaluated on the Switchboard Corpus. Experiments [1,2] on the OGI Corpus suggested that the new Stochastic Method for the Automatic Recognition of Topics (SMART) yields a large improvement over the existing Euclidean Nearest Wrong Neighbours (ENWN) algorithm, which had outperformed competing systems in evaluations [3,4]. However, the small amount of data available for these experiments meant that more rigorous testing was required. We reimplemented the algorithm to run on the larger Switchboard Corpus, and report an improvement of SMART over ENWN characterised by a 35.8% reduction in ROC (receiver operating characteristic) error area. Statistical significance was demonstrated using a modified version of the McNemar test.


doi: 10.21437/Eurospeech.2001-92

Cite as: Theunissen, M.W., Scheffler, K., Preez, J.A.d. (2001) Phoneme-based topic spotting on the switchboard corpus. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 283-286, doi: 10.21437/Eurospeech.2001-92

@inproceedings{theunissen01_eurospeech,
  author={M. W. Theunissen and K. Scheffler and J. A. du Preez},
  title={{Phoneme-based topic spotting on the switchboard corpus}},
  year=2001,
  booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)},
  pages={283--286},
  doi={10.21437/Eurospeech.2001-92}
}