This work presents an approach to speech enhancement that operates using a speech production model to reconstruct a clean speech signal from a set of speech parameters that are estimated from the noisy speech. The motivation is to remove the distortion and residual and musical noises that are associated with conventional filtering-based methods of speech enhancement. The STRAIGHT vocoder forms the model for speech reconstruction and requires a time-frequency surface and fundamental frequency information. Hidden Markov model synthesis is used to create an estimate of the time-frequency surface and this is combined with the noisy surface using a perceptually motivated signal-to-noise ratio weighting. Experimental results compare the proposed reconstruction-based method to conventional filtering-based approaches of speech enhancement.
Bibliographic reference. Kato, Akihiro / Milner, Ben (2014): "Using hidden Markov models for speech enhancement", In INTERSPEECH-2014, 2695-2699.