Interspeech'2005 - Eurospeech
A vector autoregressive (VAR) model is used in the auditory time-frequency domain to predict spectral changes. Forward and backward prediction errors increases at the phone boundaries. These error signals are then used to study and detect the boundaries of the largest changes allowing the most reliable automatic segmentation. Using a fully unsupervised method yields segments consisting of a variable number of phones. The quality of performance of this method was tested with a set of 150 Finnish sentences pronounced by one female and two male speakers. The performance for English was tested using the TIMIT core test set. The boundaries between stops and vowels, in particular, are detected with high probability and precision.
Bibliographic reference. Korhonen, Petri / Laine, Unto K. (2005): "Unsupervised segmentation of continuous speech using vector autoregressive time-frequency modeling errors", In INTERSPEECH-2005, 661-664.