Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Experiments in Constrained Maximum Likelihood Extraction of Temporal Features for Speech Recognition

Gilles Boulianne, Julie Brousseau, Nathalie Talbot, Pierre Dumouchel

Centre de recherche informatique de Montréal, Québec, Canada

Input features that capture speech dynamics have frequently been proposed to improve recognition accuracy. A broad class of such features can be obtained by applying a linear projection to a window spanning successive feature vectors. The linear projection can be directly compared to conventional modeling schemes when it is optimized according to a maximum likelihood criterion. On a large acoustic training database of conversational telephone speech, phoneme errors were reduced by 5.5% and word errors by 6% using maximum likelihood temporal features. Smaller databases were subject to undertraining and no significant improvements in error rates were observed.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Boulianne, Gilles / Brousseau, Julie / Talbot, Nathalie / Dumouchel, Pierre (1999): "Experiments in constrained maximum likelihood extraction of temporal features for speech recognition", In EUROSPEECH'99, 1083-1086.