ISCA Archive SPKD 2008
ISCA Archive SPKD 2008

Effective segmentation based on vocal effort change point detection

Chi Zhang, John H. L. Hansen

Non-neutral speech data has a strong negative impact on speech processing systems such as Automatic Speech Recognition (ASR) or speaker ID systems [1]. It is therefore necessary to detect and segment non-neutral speech data before further processing steps. Alternatively, the detection and segmentation of non-neutral speech segments from an input speech stream can be used in speech analysis and understanding, or in speech file retrieval systems to detect speech files containing whispered speech representing sensitive information, or shouted speech denoting strong emotion. This study addresses the segmentation problem for vocal effort change by deploying an improved feature based T2-BIC algorithm. Several features are considered as input to the T2-BIC algorithm in this study. A new fused evaluation criterion, Multi-Error Score (MES), is proposed to explore which feature conveys the most information on vocal effort. Results show that the lowest mean MES (56.49) occurs for the energy ratio feature for segmentation of different vocal effort speech segments based on vocal effort change point detection. Finally, recommendations are made for integrating this framework to advance knowledge processing for subsequent speech systems.


Cite as: Zhang, C., Hansen, J.H.L. (2008) Effective segmentation based on vocal effort change point detection. Proc. ISCA ITRW on Speech Analysis and Processing for Knowledge Discovery, paper 034

@inproceedings{zhang08_spkd,
  author={Chi Zhang and John H. L. Hansen},
  title={{Effective segmentation based on vocal effort change point detection}},
  year=2008,
  booktitle={Proc. ISCA ITRW on Speech Analysis and Processing for Knowledge Discovery},
  pages={paper 034}
}