In this paper, we present a new method for segmenting speech at the phoneme level. For this purpose, we use the short-time Fourier transform of the speech signal. The goal is to recognize the locations of main energy changes in frequency over time, which can be described as phoneme boundaries. We apply a sub-band analysis and search for energy changes in individual bands as well to obtain further precision. Moreover, we employ the modified group-delay function to achieve a more clear representation of the locations of boundaries, and smooth out the undesired fluctuations of the signal. We also study the use of an auditory spectrogram instead of a regular spectrogram in the segmentation process. Since this method merely utilizes the power spectrum of the signal for segmentation, there is no need for any adaptation of the parameters or training for different speakers in advance. In addition, no transcript information such as the phonemes themselves or voiced/unvoiced decision making is required. The method was tested over the phonetically-diverse part of the Timit database, and the results show that 87% of the boundaries are successfully recognized.
Bibliographic reference. Golipour, Ladan / O'Shaughnessy, Douglas (2007): "A new approach for phoneme segmentation of speech signals", In INTERSPEECH-2007, 1933-1936.