International Symposium on Chinese Spoken Language Processing (ISCSLP 2002)

Taipei, Taiwan
August 23-24, 2002

Time-Frequency Distributions of Spectrum Energy Operator in Large Vocabulary Mandarin Speaker Independent Speech Recognition System

Fadhil H. T. Al-Dulaimy, Zuoying Wang

Tsinghua University, Beijing, China

The main task of this work is to improve the performance of the existing recognition system in the acoustic and phonetic phases by extracting new features joined with the baseline system feature vectors to increase the separation distance measure especially between confused syllables in Speaker Independent Large Vocabulary Mandarin Speech Recognition System (SILVMSRS). We demonstrate the effect of using the Non-Linear Energy Operator (NLEO) distribution based on AM-FM demodulation techniques on the error rate reduction, assuming that the individual component signals are spectrally isolated by each other and can be modeled as discrete-time mono-component AM-FM signals. Using NLEO as feature instead of the traditional energy operator (TEO) method of computing the energy, by examining many parameters in spectrum distribution in combination with the MFCC as front-end detection parameters combined with the acoustic modeling type Duration Distribution Based Hidden Markov Model (DDBHMM). The experiment shows the advantage of eliminating the pre-emphasis, while using NLEO in the feature vectors instead of TEO. The Relative Average Error Rate Reduction (RAERR) is improved when the number of candidates are increased (5.64%, 9.89%, 19.26%) when 1, 5, 25 candidates are used respectively .if we are careful to adjust the way of computing the parameters of the energy operator, these are affected by the distribution of these components in the time-frequency space.

