INTERSPEECH 2004 - ICSLP
8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Precise Phone Boundary Detection Using Wavelet Packet and Recurrent Neural Networks

Farshad Almasganj

Amirkabir university, Iran

The automatic segmentation is an important research subject in speech processing. Many approaches are developed in this field and good results are reported. In this paper we show that choosing wavelet packet coefficients enables us to overcome the problem of the tradeoff between frequency and time resolution which appears in normal spectral features like MFCC and causes low time resolution. In addition, the phone boundaries have a transition nature which is an effect of vocal tract movement limitations. Usually, there is a transition zone between two unlike phones. We can use some aspects of this phenomenon in segmentation, by applying features near before and after the boundary to the input of the segmentation model. We used this point, and found good results, and also we found that if we use a more dynamic segmentation model with the ability of following dynamics, like recurrent neural network, it locates phone boundaries more precisely. We tested our approach using two sets of train and test Persian utterances. Experimental results showed an overall 8.14 millisecond tolerance for detected phone boundaries.

Full Paper

Bibliographic reference.  Almasganj, Farshad (2004): "Precise phone boundary detection using wavelet packet and recurrent neural networks", In INTERSPEECH-2004, 2761-2764.