Interspeech'2005 - Eurospeech
We propose a new set of features based on the temporal statistics of the spectral entropy of speech. We show why these features make good inputs for a speech detector. Moreover, we propose a back-end that uses the evidence from the above features in a ‘focused' manner. Subsequently, by means of recognition experiments we show that using the above back-end leads to significant performance improvements, but merely appending the features to the standard feature vector does not improve performance. We also report a 10% average improvement in word error rate over our baseline for the highly mis-matched case in the Aurora3.0 corpus.
Bibliographic reference. Subramanya, Amarnag / Bilmes, Jeff / Chen, Chia-Ping (2005): "Focused word segmentation for ASR", In INTERSPEECH-2005, 393-396.