EUROSPEECH 2003 - INTERSPEECH 2003
8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003

        

Covariation and Weighting of Harmonically Decomposed Streams for ASR

Philip J.B. Jackson (1), David M. Moreno (2), Martin J. Russell (3), Javier Hernando (2)

(1) University of Surrey, U.K.
(2) Universitat Politecnica de Catalunya, Spain
(3) University of Birmingham, U.K.

Decomposition of speech signals into simultaneous streams of periodic and aperiodic information has been successfully applied to speech analysis, enhancement, modification and recently recognition. This paper examines the effect of different weightings of the two streams in a conventional HMM system in digit recognition tests on the Aurora 2.0 database. Comparison of the results from using matched weights during training showed a small improvement of approximately 10% relative to unmatched ones, under clean test conditions. Principal component analysis of the covariation amongst the periodic and aperiodic features indicated that only 45 (51) of the 78 coefficients were required to account for 99% of the variance, for clean (multi-condition) training, which yielded an 18.4% (10.3%) absolute increase in accuracy with respect to the baseline. These findings provide further evidence of the potential for harmonically-decomposed streams to improve performance and substantially to enhance recognition accuracy in noise. La descomposicion de senales del habla en flujos simultaneos de informacion periodica y aperiodica ha sido aplicada exitosamente al analisis, realce, modificacion y, recientemente, reconocimiento del habla. Este articulo examina el efecto de diferentes ponderaciones de estos dos flujos en un sistema `HMM' convencional de reconocimiento de digitos con la base de datos Aurora 2.0. Bajo condiciones de prueba no ruidosas, la comparacion de los resultados utilizando ponderaciones coincidentes durante entrenamiento y prueba mostro una pequena mejora relativa de aproximadamente un 10% con respecto al caso de utilizar ponderaciones solo en las puebas. El analisis de componentes principales de la covarianza entre los rasgos periodicos y aperiodicos indico que solo fueron requeridos 45 (51) de los 78 coeficientes para cubrir el 99% de la varianza, para entrenamento limpio (multicondicional), el cual produjo un incremento absoluto del 18.4% (10.3%) en exactitud con respecto a la prueba base. Estos descubrimientos proporcionan evidencias adicionales del potencial de los flujos descompuestos harmonicamente para dar mejoras en rendimiento y, sustancialmente, para realzar la exactitud del reconocimiento en ruido.

Full Paper

Bibliographic reference.  Jackson, Philip J.B. / Moreno, David M. / Russell, Martin J. / Hernando, Javier (2003): "Covariation and weighting of harmonically decomposed streams for ASR", In EUROSPEECH-2003, 2321-2324.