In this paper, we propose a robust speech detection algorithm with Heteroscedastic Discriminant Analysis (HDA) applied to the Time-Frequency Energy (TFE). The TFE consists of the log energy in time domain, the log energy in the fixed band 250-3500 Hz, and the log Mel-scale frequency bands energy. The bottom-up algorithm with automatic threshold adjustment is used for accurate word boundary detection. Compared to the algorithms based on the energy in time domain [1], the ATF parameter [2], the energy and the LDA-MFCC parameter [3], the proposed algorithm shows better performance under different types of noise.
s
L. F. Lamel, L. R. Rabiner, A. E. Rosenberg, and J. G. Wilson, "An improved endpoint detector for isolated word recognition," IEEE Trans. Acoustic, Speech and Signal Processing, v29, pp. 777-785, Aug. 1981. G. D. Wu and C. T. Lin, "Speech detection with mel-Scale frequency bank in noisy environment". IEEE Trans. Speech and Audio Processing, v8, pp. 541-554, Sep 2000. A. Martin, D. Charlet, and L. Mauuary, "Robust speech/non-speech detection using LDA applied to MFCC", Proceedings of ICASSPÂ’2001, v1, pp. 237-240, 2001.
Cite as: Tian, Y., Wang, Z., Lu, D. (2002) Robust speech detection with heteroscedastic discriminant analysis applied to the time-frequency energy. Proc. International Symposium on Chinese Spoken Language Processing, paper 88
@inproceedings{tian02_iscslp, author={Ye Tian and Zuoying Wang and Dajin Lu}, title={{Robust speech detection with heteroscedastic discriminant analysis applied to the time-frequency energy}}, year=2002, booktitle={Proc. International Symposium on Chinese Spoken Language Processing}, pages={paper 88} }