Sixth International Conference on Spoken Language Processing
In this paper, we address the issue of making use of spectral peak location information in a speech recognition system. The cepstral features that are used in most speech recognition systems, though perceptually motivated, do not explicitly model spectral peak trajectory information, which is a valuable clue to identifying the underlying phone. We present a study that examines the utility of using this information in speech recognition, to augment the information present in the cepstra.
We propose a method based on bandpass filtering the speech signal using several filters with different passbands, and using an adaptive IIR filter to track the locations of the spectral peaks in each bandpass output. This method has the advantage that along with the estimate of the spectral peak frequency, it also provides the energy at the spectral peaks (a feature that turns out to be quite informative). In initial experiments, the bandpass filters were chosen to correspond to the formant ranges, consequently, the locations of the spectral peaks are expected to correspond to the locations of the formants, for voiced sounds.
We next investigated the utility of using this spectral peak information to help discriminate between the phones used in speech recognition. In order to quantify the information provided by the new features (over and above the information provided by the cepstra), we measure the mutual information between the augmented feature vector (cepstra augmented with the new features) and the phonetic class labels, and compare it to the mutual information between the classes and the cepstra. Finally, we experimented with feature fusion techniques, where the new features were appended to the cepstra, and a new speech recognition system was trained on the augmented features.
Bibliographic reference. Padmanabhan, Mukund (2000): "Spectral peak tracking and its use in speech recognition", In ICSLP-2000, vol.4, 604-607.