INTERSPEECH 2012
13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Enhancing Speech by Reconstruction from Robust Acoustic Features

Philip Harding, Ben Milner

University of East Anglia, Norwich, UK

A method of speech enhancement is developed that reconstructs clean speech from a set of acoustic features using a sinusoidal model of speech. This is a significant departure from traditional filtering-based methods of speech enhancement. A major challenge with this approach is to estimate accurately the acoustic features (voicing, fundamental frequency, spectral envelope) from noisy speech. This is achieved using maximum a-posteriori estimation methods that operate on the noisy speech. Objective results are presented to optimise the proposed system and a set of subjective tests compare the approach with traditional enhancement methods.

Index Terms: speech enhancement, MAP, sinusoidal model

Full Paper

Audio Examples

Utterance: "Look out of the window and see if it's raining"; speaker: Nuance_Catherine; sampling frequency: 8 kHz; noise: Street noise from AURORA framework; SNRs: 15dB, 5dB, 0dB
Original
0 dB    5 dB    15 dB   No Noise Compensation
0 dB    5 dB    15 dB   Spectral Subtraction
0 dB    5 dB    15 dB   Wiener Filtering
0 dB    5 dB    15 dB   log MMSE
0 dB    5 dB    15 dB   Sinusoidal model-based method with HMM-based phoneme labels and MAP-based pitch estimation

Bibliographic reference.  Harding, Philip / Milner, Ben (2012): "Enhancing speech by reconstruction from robust acoustic features", In INTERSPEECH-2012, 943-946.