ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Speech activity detection fusing acoustic phonetic and energy features

Etienne Marcheret, Karthik Visweswariah, Gerasimos Potamianos

With the wider deployment of automatic speech recognition (ASR) systems, the importance of robust speech activity detection has been elevated both as a means of reducing bandwidth in client/server ASR and for overall system stability from barge-in through the recognition process. In this paper we investigate a novel technique for speech activity detection, that we have found to be effective in handling non-stationary noise events without negatively impacting the recognition process. This technique is based on combining acoustic phonetic likelihood based features with energy features extracted from the signal waveform. Reported results on two speech activity detection tasks demonstrate that the proposed method outperforms techniques which rely solely on acoustic or energy features.

doi: 10.21437/Interspeech.2005-118

Cite as: Marcheret, E., Visweswariah, K., Potamianos, G. (2005) Speech activity detection fusing acoustic phonetic and energy features. Proc. Interspeech 2005, 241-244, doi: 10.21437/Interspeech.2005-118

  author={Etienne Marcheret and Karthik Visweswariah and Gerasimos Potamianos},
  title={{Speech activity detection fusing acoustic phonetic and energy features}},
  booktitle={Proc. Interspeech 2005},