A method of speech enhancement is developed that reconstructs clean speech from a set of acoustic features using a sinusoidal model of speech. This is a significant departure from traditional filtering-based methods of speech enhancement. A major challenge with this approach is to estimate accurately the acoustic features (voicing, fundamental frequency, spectral envelope) from noisy speech. This is achieved using maximum a-posteriori estimation methods that operate on the noisy speech. Objective results are presented to optimise the proposed system and a set of subjective tests compare the approach with traditional enhancement methods.
Index Terms: speech enhancement, MAP, sinusoidal model
Utterance: "Look out of the window and see if it's raining"; speaker: Nuance_Catherine; sampling frequency: 8 kHz; noise: Street noise from AURORA framework; SNRs: 15dB, 5dB, 0dB
0 dB 5 dB 15 dB No Noise Compensation
0 dB 5 dB 15 dB Spectral Subtraction
0 dB 5 dB 15 dB Wiener Filtering
0 dB 5 dB 15 dB log MMSE
0 dB 5 dB 15 dB Sinusoidal model-based method with HMM-based phoneme labels and MAP-based pitch estimation
Bibliographic reference. Harding, Philip / Milner, Ben (2012): "Enhancing speech by reconstruction from robust acoustic features", In INTERSPEECH-2012, 943-946.