In this paper, we present a speech and audio analysis-synthesis method based on a Basilar Membrane (BM) model. The audio signal is represented in this method by the Hilbert envelopes of the responses to complex gammatone filters uniformally spaced on a critical band scale. We show that for speech and audio signals, a perceptually equivalent signal can be reconstructed from the envelopes alone by an iterative procedure that estimates the associated carrier for the envelopes. The rate requirement of the envelope information is reduced by low-pass filtering and sampling, and it is shown that it is possible to recover a signal without audible distortion from the sampled envelopes. This may lead to improved perceptual coding methods.
|fspeech.wav||Original unmodified speech sample|
|fspeech0.wav||Original speech file bandpass filtered to match the system transfer function|
|fspeech1.wav||Speech signal reconstructed from envelopes: first estimate using monochromatic carriers|
|fspeech2.wav||Speech signal reconstructed from envelopes: final estimate after 200 iterations|
|fspeech3.wav||Speech signal reconstructed from modified envelopes: envelopes for channels above 1500Hz sampled at 1.2 times their CBW.|
|fspeech4.wav||Speech signal reconstructed from modified envelopes: envelopes for channels above 1500Hz sampled at 1.2 times their CBW, below 1500Hz sampled at 250Hz.|
|organ0.wav||Original music sample, bandpass filtered|
|organ1.wav||Music sample reconstructed from envelopes: estimate after 200 iterations|
|organ2.wav||Music sample reconstructed from envelopes: estimate after 1000 iterations|
|organ3.wav||Music sample reconstructed from envelopes: estimate after 200 iterations using a higher density of envelope channels (4 per CB, 123 total)|
Bibliographic reference. Thiemann, Joachim / Kabal, Peter (2007): "Reconstructing audio signals from modified non-coherent hilbert envelopes", In INTERSPEECH-2007, 534-537.