ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Segmentation of a speech waveform according to glottal open and closed phases using an autoregressive-HMM

Gavin Smith, Tony Robinson

This paper presents an algorithm to segment speech according to glottal open and closed phases using the time waveform alone. Based on this, pitch, jitter and closed to open glottal ratios can be computed. Segmentation is achieved by identifying spectral changepoints at the sub-pitch period timescale. Changepoints are identified using a 3-state autoregressive hidden Markov model (AR-HMM) operating on the time waveform, with the Liljencrants-Fant (LF) glottal model as a theoretical basis. Model parameters and optimal state sequence are determined re- spectively using the expectation-maximisation (EM) algorithm and a bounded state duration (BSD) Viterbi algo- rithm. Experiments on synthetic speech give encouraging glottal segmentation for modal, fry and breathy voice types. Experiments on real speech obtained from TIMIT give meaningful segmentations also.


Cite as: Smith, G., Robinson, T. (2000) Segmentation of a speech waveform according to glottal open and closed phases using an autoregressive-HMM. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, 469-472

@inproceedings{smith00b_icslp,
  author={Gavin Smith and Tony Robinson},
  title={{Segmentation of a speech waveform according to glottal open and closed phases using an autoregressive-HMM}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 1, 469-472}
}