Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Explicit Segmentation of Speech Based on Frequency-Domain AR Modeling

T. Nagarajan, Douglas O'Shaughnessy

Université du Québec, Canada

In the development of a syllable-centric Automatic Speech Recognition (ASR) system, segmentation of the speech signal into syllabic units is an important stage. In [1], an implicit algorithm is presented for segmenting the continuous speech signal into syllable-like units, in which the orthographic transcription is not used. In the present study, a new explicit segmentation algorithm is proposed and analyzed that uses the orthographic transcription of the given continuous speech signal. The advantage of using the transcription during segmentation is that the number of syllable segments present in the speech signal can be known a priori. Although the short-term energy (STE) function contains useful information about syllable segment boundaries, it cannot be directly used to perform segmentation due to significant local energy fluctuations. In the present work, an Auto-Regressive model-based algorithm is presented which essentially smooths the STE function using the knowledge of the number of syllable segments required/present in the given speech signal. Experiments carried out on the TIMIT speech corpus show that the error in segmentation is at most 40 ms for 87.84% of the syllable segments.

Full Paper

Bibliographic reference.  Nagarajan, T. / O'Shaughnessy, Douglas (2005): "Explicit segmentation of speech based on frequency-domain AR modeling", In INTERSPEECH-2005, 653-656.