8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Phase-Space Representation of Speech

Hua Yu

Carnegie Mellon University, USA

Speech production is essentially a nonlinear dynamic process. Motivated by ideas in dynamic system research, this paper seeks to recast the speech representation problem (front-end) as an attempt to reconstruct the phase space of the production process, or articulatory configurations. We point out that the use of the delta and double delta features, common in current ASR (Automatic Speech Recognition) systems, corresponds to time-delayed embedding, a technique in nonlinear time series analysis for phase space reconstruction. The traditional delta and double features also impose a suboptimal linear transform in the reconstructed space. We show that a significant improvement in recognition accuracy can be achieved by choosing the transform in a data-driven fashion.

Full Paper

Bibliographic reference.  Yu, Hua (2004): "Phase-space representation of speech", In INTERSPEECH-2004, 909-912.