An Optimization Framework for Recovery of Speech from Phase-Encoded Spectrograms

Abhilash Sainathan, Sunil Rudresh, Chandra Sekhar Seelamantula


In general, reconstruction of a speech signal from the spectrogram is non-unique because of the unavailability of the phase spectrum. Considering zero phase would result in a minimum-phase reconstruction. This limitation is overcome by computing the recently introduced phase-encoded spectrogram. In this approach, one modifies each frame of a speech signal to possess the causal, delta-dominant (CDD) property prior to computing the spectrogram. In an earlier publication, we showed that finite-length CDD sequences can be retrieved exactly from their magnitude spectra using a cepstrum technique. Although exactness is guaranteed in principle, practical implementations result in a limited, but high, reconstruction accuracy. In this paper, we focus on increasing the reconstruction accuracy. We formulate the reconstruction problem within an optimization framework and deploy a recently proposed iterative, alternating direction method of multipliers (ADMM) algorithm called autocorrelation retrieval—Kolmogorov factorization (CoRK). Experimental validations show that the CoRK algorithm results in a reconstruction accurate up to machine precision. We also show that both CoRK and cepstrum techniques are robust and invariant to the choice of the window duration, the amount of overlap between consecutive speech frames, the strength of the delta used to impart the CDD property and the presence of noise.


 DOI: 10.21437/Interspeech.2018-1987

Cite as: Sainathan, A., Rudresh, S., Seelamantula, C.S. (2018) An Optimization Framework for Recovery of Speech from Phase-Encoded Spectrograms. Proc. Interspeech 2018, 741-745, DOI: 10.21437/Interspeech.2018-1987.


@inproceedings{Sainathan2018,
  author={Abhilash Sainathan and Sunil Rudresh and Chandra Sekhar Seelamantula},
  title={An Optimization Framework for Recovery of Speech from Phase-Encoded Spectrograms},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={741--745},
  doi={10.21437/Interspeech.2018-1987},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1987}
}