4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
The results from applying an improved algorithm in the task of automatic segmentation of spontaneous telephone quality speech are presented, and compared to the results from those resulting from super imposing white noise. Three segmentation algorithms are compared which are all based on variants of the Spectral Variation Function. Experimental results are obtained on the OGI multi-language telephone speech corpus (OGI TS).We show that the use of the auditory forward and backward masking effects prior to the SVF computation increases the robustness of the algorithm to white noise. When the average signal-to-noise ratio (SNR) is decreased to 10dB the peak ratio (defined as the ratio of the number of peaks measured at the target over the original SNRs) is increased by 16%, 12%, and 11% for theMFC(Mel-FrequencyCepstra), RASTA(RelAtive SpecTrAl processing), and the FBDYN (Forward-Backward auditory masking DYNamic cepstra) SVF segmentation algorithms, respectively.
Bibliographic reference. Petek, Bojan / Andersen, Ove / Dalsgaard, Paul (1996): "On the robust automatic segmentation of spontaneous speech", In ICSLP-1996, 913-916.