Glottal Closure Instants Detection from Speech Signal by Deep Features Extracted from Raw Speech and Linear Prediction Residual

Gurunath Reddy M., K. Sreenivasa Rao, Partha Pratim Das


Glottal closure instants (GCI) also called as instants of significant excitation occur during abrupt closure of vocal folds is a well-studied problem for its many potential applications in speech processing. Speech signal or its transformed linear prediction residual (LPR) is the most popular signal representations for GCI detection. In this paper, we propose a supervised classification based GCI detection method, in which, we train multiple convolution neural networks to determine the suitable feature representation for efficient GCI detection. Also, we show that the combined model trained with joint acoustic-residual deep features and the model trained with low pass filtered speech significantly increases the detection accuracy. We have manually annotated the speech signal for ground truth GCI using electroglottograph (EGG) as a reference signal. The evaluation results showed that the proposed model trained with very small and less diverse data performs significantly better than the traditional signal processing and most recent data-driven approaches.


 DOI: 10.21437/Interspeech.2019-1981

Cite as: M., G.R., Rao, K.S., Das, P.P. (2019) Glottal Closure Instants Detection from Speech Signal by Deep Features Extracted from Raw Speech and Linear Prediction Residual. Proc. Interspeech 2019, 156-160, DOI: 10.21437/Interspeech.2019-1981.


@inproceedings{M.2019,
  author={Gurunath Reddy M. and K. Sreenivasa Rao and Partha Pratim Das},
  title={{Glottal Closure Instants Detection from Speech Signal by Deep Features Extracted from Raw Speech and Linear Prediction Residual}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={156--160},
  doi={10.21437/Interspeech.2019-1981},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1981}
}