EUROSPEECH 2003 - INTERSPEECH 2003
This paper intends to summarize some of the robust feature extraction and acoustic modeling technologies used at Multitel, together with their assessment on some of the ETSI Aurora reference tasks. Ongoing work and directions for further research are also presented. For feature extraction (FE), we are using PLP coefficients. Additive and convolutional noise are addressed using a cascade of spectral subtraction and temporal trajectory filtering. For acoustic modeling (AM), artificial neural networks (ANNs) are used for estimating the HMM state probabilities. At the junction of FE and AM, the multi-band structure provides a way to address the needs of robustness by targeting both processing levels. Robust features within sub-bands can be extracted using a form of discriminant analysis. In this work, this is obtained using sub-band ANN acoustic models. The robust sub-band features are then used for the estimation of state probabilities. These systems are evaluated on the Aurora tasks in comparison to the existing ETSI features. Our baseline system has similar performance than the ETSI advanced features coupled with the HTK back-end. On the Aurora 3 tasks, the multi-band system outperforms the best ETSI results with an average reduction of the word error rate of about 62% with respect to the baseline ETSI system and of about 18% with respect to the advanced ETSI system. This confirm previous positive experience with the multi-band architecture on other databases.
Bibliographic reference. Dupont, Stephane / Ris, Christophe (2003): "Robust feature extraction and acoustic modeling at multitel: experiments on the Aurora databases", In EUROSPEECH-2003, 1789-1792.