Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Synergy of Spectral and Perceptual Features in Multi-Source Connectionist Speech Recognition

Roberto Gemello (1), Loreta Moisa (2), Pietro Laface (2)

(1) CSELT - Centro Studi e Laboratori Telecomunicazioni, Torino, Italy
(2) Politecnico di Torino - Dipartimento di Automatica e Informatica, Torino, Italy

The combined use of different set of features extracted from the speech signal with different processing algorithms is a promising approach to improve speech recognition performances. Artificial Neural Networks are well suited to this task since they are able to use directly multiple heterogeneous input features to estimate a near optimal combination of them for classification, without being constrained by a priori assumptions on the stochastic independence of the input sources. This work shows how we have taken advantage of these characteristics of Neural Networks to improve the recognition accuracy of our systems. In particular, three set of input features have been considered as sources in this work: Mel based Cepstral Coefficients derived from the FFT spectrum, RASTA-PLP Cepstral Coefficients, and a set of features that describe the dynamics of the FFT power spectrum along the frequency dimension, instead of the usual time dimension. The experimental results confirm the usefulness of the proposed approach of feature integration that leads to a significant error reduction both on isolated and continuous speech recognition tasks on a large telephone speech test set.

Full Paper

Bibliographic reference.  Gemello, Roberto / Moisa, Loreta / Laface, Pietro (2000): "Synergy of spectral and perceptual features in multi-source connectionist speech recognition", In ICSLP-2000, vol.2, 843-846.