We propose a novel approach to find the locations of the multipulse sequence that approximates the speech source excitation. This approach is based on the notion of Most Singular Manifold (MSM) which is associated to the set of less predictable events. The MSM is formed by identifying (directly from the speech waveform) multiscale singularities which may correspond to significant impulsive excitations of the vocal tract. This identification is done through a multiscale measure of local predictability and the estimation of its associated singularity exponents. Once the pulse locations are found using the MSM, their amplitudes are computed using the second stage of the classical MultiPulse Excitation (MPE) coder. The multipulse sequence is then fed to the classical LPC synthesizer to reconstruct speech. The resulting MSM-based algorithm is shown to be significantly more efficient than MPE. We evaluate our algorithm using 1 hour of speech from the TIMIT database and compare its performances to MPE and a recent approach based on compressed sensing (CS). The results show that our algorithm yields similar perceptual quality than MPE and outperforms the CS method when the number of pulses is low.
Index Terms: Multipulse speech coding, source excitation approximation, multiscale signal processing, singularity exponents.
Bibliographic reference. Khanagha, Vahid / Daoudi, Khalid (2012): "Efficient multipulse approximation of speech excitation using the most singular manifold", In INTERSPEECH-2012, 875-878.