International Symposium on Chinese Spoken Language Processing (ISCSLP 2002)

Taipei, Taiwan
August 23-24, 2002

Robust Speech Detection with Heteroscedastic Discriminant Analysis Applied to the Time-Frequency Energy

Ye Tian, Zuoying Wang, Dajin Lu

Tsinghua University, Beijing, China

In this paper, we propose a robust speech detection algorithm with Heteroscedastic Discriminant Analysis (HDA) applied to the Time-Frequency Energy (TFE). The TFE consists of the log energy in time domain, the log energy in the fixed band 250-3500 Hz, and the log Mel-scale frequency bands energy. The bottom-up algorithm with automatic threshold adjustment is used for accurate word boundary detection. Compared to the algorithms based on the energy in time domain [1], the ATF parameter [2], the energy and the LDA-MFCC parameter [3], the proposed algorithm shows better performance under different types of noise.

References

  1. L. F. Lamel, L. R. Rabiner, A. E. Rosenberg, and J. G. Wilson, "An improved endpoint detector for isolated word recognition," IEEE Trans. Acoustic, Speech and Signal Processing, v29, pp. 777-785, Aug. 1981.
  2. G. D. Wu and C. T. Lin, "Speech detection with mel-Scale frequency bank in noisy environment". IEEE Trans. Speech and Audio Processing, v8, pp. 541-554, Sep 2000.
  3. A. Martin, D. Charlet, and L. Mauuary, "Robust speech/non-speech detection using LDA applied to MFCC", Proceedings of ICASSPí2001, v1, pp. 237-240, 2001.


Full Paper

Bibliographic reference.  TIAN, Ye / WANG, Zuoying / LU, Dajin (2002): "Robust speech detection with heteroscedastic discriminant analysis applied to the time-frequency energy", In ISCSLP 2002, paper 88.