We propose a novel technique for robust voiced/unvoiced segment detection in noisy speech, based on local polynomial regression. The local polynomial model is well-suited for voiced segments in speech. The unvoiced segments are noise-like and do not exhibit any smooth structure. This property of smoothness is used for devising a new metric called the variance ratio metric, which, after thresholding, indicates the voiced/unvoiced boundaries with 75% accuracy for 0dB global signal-to-noise ratio (SNR). A novelty of our algorithm is that it processes the signal continuously, sample-by-sample rather than frame-by-frame. Simulation results on TIMIT speech database (downsampled to 8kHz) for various SNRs are presented to illustrate the performance of the new algorithm. Results indicate that the algorithm is robust even in high noise levels.
Bibliographic reference. Murthy, A. Sreenivasa / Sekhar, S. Chandra / Sreenivas, T. V. (2007): "Robust and high-resolution voiced/unvoiced classification in noisy speech using a signal smoothness criterion", In INTERSPEECH-2007, 2965-2968.