7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Entropy of Energy Operator as Feature for Large Vocabulary Mandarin Speaker Independent Speech Recognition

Fadhil H. T. Al-Dulaimy, Zuoying Wang

Tsinghua University, China

This work demonstrates the Non-linear Time-frequency distribution of Discrete Time Energy Operator DTEO based on AM-FM demodulation techniques, assuming that the individual component signals are spectrally isolated by each other and can be modeled as discretetime mono-component AM-FM signals. This is proposed to be use with its entropy as an input to a Duration Distribution Based Hidden Markov Module (DDBHMM) in Speaker Independent (SI) Large Vocabulary Mandarin Speech Recognition (SI-LVMSR) system, by combining the feature vectors as output of the front-end detection stage. The goal is to improve the performance of the existing system by combining new features in the baseline feature vector. Using the preemphasized filter in the front end of the present recognizer causes an increase in the noise energy at high frequencies above 4 kHz, which in some cases degrades the recognition accuracy. This paper also involves dealing with this problem by eliminating pre-emphasis filters with entropy of NLDTEO combined with MFCC instead of the traditional techniques. The experiment results show significant reduction (24.96%) in relative error rate, by using the new technique with eliminating the pre-emphasized filter from pre-processing stage. the PEF from the pre-processing stage.


Full Paper

Bibliographic reference.  Al-Dulaimy, Fadhil H. T. / Wang, Zuoying (2002): "Entropy of energy operator as feature for large vocabulary Mandarin speaker independent speech recognition", In ICSLP-2002, 2105-2108.