INTERSPEECH 2012
13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Toward an Optimum Feature Set and HMM Model Parameters for Automatic Phonetic Alignment of Spontaneous Speech

Montri Karnjanadecha (1), Stephen A. Zahorian (2)

(1) Department of Computer Engineering, Faculty of Engineering, Prince of Songkla University Hat Yai, Songkhla, Thailand
(2) Department of Electrical and Computer Engineering, Binghamton University Binghamton, NY, USA

Many speech segmentation techniques have been proposed to automate phonetic alignment. Most of the techniques require, however, labeled data to train, and perform well only for read, high-quality speech. Automatic phonetic alignment, for lower quality varied data with no labeled training data, the subject of this paper, is a much more challenging domain. An HMM-based automatic speech recognizer was used in this study to determine phonetic sequences and boundaries of "open source" speech data, retrieved from public websites. The HMM models were initially trained using the TIMIT database and subsequently adapted to each passage. Standard frontend features such as MFCC, LPCC and PLP, and features computed by applying the DCT directly to the short-time spectrum (DCTC) were evaluated using TIMIT data. The "best" parameter set was found to be DCTC_78 and these parameters were used to align the speech data of interest.

Index Terms: speech segmentation, phonetic alignment, speech recognition

Full Paper

Bibliographic reference.  Karnjanadecha, Montri / Zahorian, Stephen A. (2012): "Toward an optimum feature set and HMM model parameters for automatic phonetic alignment of spontaneous speech", In INTERSPEECH-2012, 2290-2293.