Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

A C/V Segmentation Method for Mandarin Speech Based on Multiscale Fractal Dimension

Fan Wang, Fang Zheng, Wenhu Wu

Center of Speech Technology, State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science & Technology, Tsinghua University, Beijing, China

This paper proposes a new algorithm for Mandarin speech Consonant and Vowel (C/V) segmentation based on the fractal theory. The new method focuses on searching the transient region between the Consonant and Vowel parts in a Mandarin syliable that in general is a concatenation of a consonant followed by a vowel. The Multiscale Fractal Dimension Set (MFD) stands for the fractal dimensions at multiple maximum resolutions of computation. Just using the r-variance of MFD (the degree of the difference from all elements of a MFD) to distinguish clearly between the stable phonemes and their transient region, the algorithm can directly search the speech frame with minimum r-variance of MFD as the C/V segmentation boundary. A result of 95.2% segmentation accuracy is obtained for clean test corpus, and 82.3% accuracy in noisy environment with the SNR of 10 dB. This shows that the new C/V segmentation algorithm is qualified for the task of continuous Mandarin speech recognition.

Full Paper

