We propose a new set of features based on the temporal statistics of the spectral entropy of speech. We show why these features make good inputs for a speech detector. Moreover, we propose a back-end that uses the evidence from the above features in a focused' manner. Subsequently, by means of recognition experiments we show that using the above back-end leads to significant performance improvements, but merely appending the features to the standard feature vector does not improve performance. We also report a 10% average improvement in word error rate over our baseline for the highly mis-matched case in the Aurora3.0 corpus.
Cite as: Subramanya, A., Bilmes, J., Chen, C.-P. (2005) Focused word segmentation for ASR. Proc. Interspeech 2005, 393-396, doi: 10.21437/Interspeech.2005-216
@inproceedings{subramanya05_interspeech, author={Amarnag Subramanya and Jeff Bilmes and Chia-Ping Chen}, title={{Focused word segmentation for ASR}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={393--396}, doi={10.21437/Interspeech.2005-216} }