Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Segmentation of a Speech Waveform According to Glottal Open and Closed Phases Using an Autoregressive-HMM

Gavin Smith, Tony Robinson

Cambridge University Engineering Department, Trumpington Street, Cambridge, UK

This paper presents an algorithm to segment speech according to glottal open and closed phases using the time waveform alone. Based on this, pitch, jitter and closed to open glottal ratios can be computed. Segmentation is achieved by identifying spectral changepoints at the sub-pitch period timescale. Changepoints are identified using a 3-state autoregressive hidden Markov model (AR-HMM) operating on the time waveform, with the Liljencrants-Fant (LF) glottal model as a theoretical basis. Model parameters and optimal state sequence are determined re- spectively using the expectation-maximisation (EM) algorithm and a bounded state duration (BSD) Viterbi algo- rithm. Experiments on synthetic speech give encouraging glottal segmentation for modal, fry and breathy voice types. Experiments on real speech obtained from TIMIT give meaningful segmentations also.

Full Paper

Bibliographic reference.  Smith, Gavin / Robinson, Tony (2000): "Segmentation of a speech waveform according to glottal open and closed phases using an autoregressive-HMM", In ICSLP-2000, vol.1, 469-472.