Landmark based recognition of unvoiced word-initial stops is investigated. The relative effectiveness of acoustic-phonetic attributes versus more global spectral shape features is experimentally evaluated for four-way place classification of unvoiced, unaspirated stops. Various feature sets derived from the burst and vocalic transition regions of word initial consonants are compared via GMM based classification under speaker, gender, and vowel-context variability. While a set of acoustic attributes derived from the burst shows the best invariance to vowel context, it is found that global spectral shape features provide the most robust representation of the vocalic transition region by overcoming the problem of errors in explicit formant tracking. A combination of features from the burst and vocalic regions was superior to burst-only cues, but still far from the near perfect identification achieved in human perception.
Bibliographic reference. Karjigi, Veena / Rao, Preeti (2008): "Landmark based recognition of stops: acoustic attributes versus smoothed spectra", In INTERSPEECH-2008, 1550-1553.