INTERSPEECH 2004 - ICSLP
Speech production is essentially a nonlinear dynamic process. Motivated by ideas in dynamic system research, this paper seeks to recast the speech representation problem (front-end) as an attempt to reconstruct the phase space of the production process, or articulatory configurations. We point out that the use of the delta and double delta features, common in current ASR (Automatic Speech Recognition) systems, corresponds to time-delayed embedding, a technique in nonlinear time series analysis for phase space reconstruction. The traditional delta and double features also impose a suboptimal linear transform in the reconstructed space. We show that a significant improvement in recognition accuracy can be achieved by choosing the transform in a data-driven fashion.
Bibliographic reference. Yu, Hua (2004): "Phase-space representation of speech", In INTERSPEECH-2004, 909-912.