Non-Local Estimation of Speech Signal for Vowel Onset Point Detection in Varied Environments

Avinash Kumar, S. Shahnawazuddin, Gayadhar Pradhan


Vowel onset point (VOP) is an important information extensively employed in speech analysis and synthesis. Detecting the VOPs in a given speech sequence, independent of the text contexts and recording environments, is a challenging area of research. Performance of existing VOP detection methods have not yet been extensively studied in varied environmental conditions. In this paper, we have exploited the non-local means estimation to detect those regions in the speech sequence which are of high signal-to-noise ratio and exhibit periodicity. Mostly, those regions happen to be the vowel regions. This helps in overcoming the ill-effects of environmental degradations. Next, for each short-time frame of estimated speech sequence, we cumulatively sum the magnitude of the corresponding Fourier transform spectrum. The cumulative sum is then used as the feature to detect the VOPs. The experiments conducted on TIMIT database show that the proposed approach provides better results in terms of detection and spurious rate when compared to a few existing methods under clean and noisy test conditions.


 DOI: 10.21437/Interspeech.2017-624

Cite as: Kumar, A., Shahnawazuddin, S., Pradhan, G. (2017) Non-Local Estimation of Speech Signal for Vowel Onset Point Detection in Varied Environments. Proc. Interspeech 2017, 429-433, DOI: 10.21437/Interspeech.2017-624.


@inproceedings{Kumar2017,
  author={Avinash Kumar and S. Shahnawazuddin and Gayadhar Pradhan},
  title={Non-Local Estimation of Speech Signal for Vowel Onset Point Detection in Varied Environments},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={429--433},
  doi={10.21437/Interspeech.2017-624},
  url={http://dx.doi.org/10.21437/Interspeech.2017-624}
}