ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

A c/v segmentation method for Mandarin speech based on multiscale fractal dimension

Fan Wang, Fang Zheng, Wenhu Wu

This paper proposes a new algorithm for Mandarin speech Consonant and Vowel (C/V) segmentation based on the fractal theory. The new method focuses on searching the transient region between the Consonant and Vowel parts in a Mandarin syliable that in general is a concatenation of a consonant followed by a vowel. The Multiscale Fractal Dimension Set (MFD) stands for the fractal dimensions at multiple maximum resolutions of computation. Just using the r-variance of MFD (the degree of the difference from all elements of a MFD) to distinguish clearly between the stable phonemes and their transient region, the algorithm can directly search the speech frame with minimum r-variance of MFD as the C/V segmentation boundary. A result of 95.2% segmentation accuracy is obtained for clean test corpus, and 82.3% accuracy in noisy environment with the SNR of 10 dB. This shows that the new C/V segmentation algorithm is qualified for the task of continuous Mandarin speech recognition.


Cite as: Wang, F., Zheng, F., Wu, W. (2000) A c/v segmentation method for Mandarin speech based on multiscale fractal dimension. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 4, 648-651

@inproceedings{wang00r_icslp,
  author={Fan Wang and Fang Zheng and Wenhu Wu},
  title={{A c/v segmentation method for Mandarin speech based on multiscale fractal dimension}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 4, 648-651}
}