Sixth International Conference on Spoken Language Processing
October 16-20, 2000
Synergy of Spectral and Perceptual Features in Multi-Source Connectionist Speech Recognition
Roberto Gemello (1), Loreta Moisa (2), Pietro Laface (2)
(1) CSELT - Centro Studi e Laboratori Telecomunicazioni,
The combined use of different set of features extracted from the
speech signal with different processing algorithms is a promising
approach to improve speech recognition performances.
Artificial Neural Networks are well suited to this task since they
are able to use directly multiple heterogeneous input features to
estimate a near optimal combination of them for classification,
without being constrained by a priori assumptions on the
stochastic independence of the input sources.
This work shows how we have taken advantage of these
characteristics of Neural Networks to improve the recognition
accuracy of our systems. In particular, three set of input features
have been considered as sources in this work: Mel based Cepstral
Coefficients derived from the FFT spectrum, RASTA-PLP
Cepstral Coefficients, and a set of features that describe the
dynamics of the FFT power spectrum along the frequency
dimension, instead of the usual time dimension.
The experimental results confirm the usefulness of the proposed
approach of feature integration that leads to a significant error
reduction both on isolated and continuous speech recognition
tasks on a large telephone speech test set.
(2) Politecnico di Torino - Dipartimento di Automatica e Informatica,
Gemello, Roberto / Moisa, Loreta / Laface, Pietro (2000):
"Synergy of spectral and perceptual features in multi-source connectionist speech recognition",
In ICSLP-2000, vol.2, 843-846.