The articulators of the human speech production mechanism can only move relatively sluggishly. This results in speech sounds of which the acoustic speech properties mostly change continuously and gradually over time. However, such continuity constraints are seldom exploited for the purpose of discriminating different phones. In order to explore to what extent incorporating continuity information can help to improve phone discrimination, we investigated a multi-frame MFCC representation in combination with a supervised dimensionality reduction method which is aimed at finding a low-dimensional representation that best separates the different phones. The speech continuity information is encoded by a second-order smoothness regularizer. Experimental results on TIMIT phone classification show that the regularizer is helpful in better distinguishing vowels, but fails to improve the discrimination of consonants.
Index Terms: Dimensionality Reduction; Contextual Representation; TIMIT; regularization; Laplacian smoothing
Bibliographic reference. Huang, Heyun / Bosch, Louis ten / Cranen, Bert / Boves, Lou (2012): "Exploring discriminative speech trajectory structures", In INTERSPEECH-2012, 1796-1799.