We describe a discriminative algorithm for automatic VOT measurement, considered as an application of predicting structured output from speech. In contrast to previous studies which use customized rules, in our approach a function is trained on manually labeled examples, using an online algorithm to predict the burst and voicing onsets (and hence VOT). The feature set used is customized for detecting the burst and voicing onsets, and the loss function used in training is the difference between predicted and actual VOT. Applied to initial voiceless stops from two corpora, the algorithm compares favorably to previous work, and the agreement between automatic and manual measurements is near human inter-judge reliability.
Bibliographic reference. Sonderegger, Morgan / Keshet, Joseph (2010): "Automatic discriminative measurement of voice onset time", In INTERSPEECH-2010, 2242-2245.