Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

Using Wavelet Dyadic Grids and Neural Networks for Speech Recognition

Richard R. Favero, Fikret S. Gurgen

Speech Technology Research Group, Department of Electrical Engineering, University of Sydney, Australia

This paper describes two multi-rate feature vectors derived from the wavelet transform coefficients for speech recognition. The feature vectors ensure time and frequency alignment across the dyadic grid. The first strategy to compose the feature vector is based on grouping by location in time. This produces frame synchronous data that can be applied to a recogniser without the addition of interpolated points on the dyadic grid. The second strategy is to compose groups of vectors according to frequency region and the sampling rate of each region. Then, the feature vectors are applied to a window based neural network (WNN) to assess speech recognition performance. The WNNs are designed to enhance the resolution of various frequency bands to improve speech recognition performance. Experiments are performed using the words /b,d,g/. The results show that the performance of the WNN using this wavelet based feature vector is comparable to that of the HMM based system reported in [3,4].

Full Paper

Bibliographic reference.  Favero, Richard R. / Gurgen, Fikret S. (1994): "Using wavelet dyadic grids and neural networks for speech recognition", In ICSLP-1994, 1539-1542.