INTERSPEECH 2004  ICSLP

We propose a new technique for modifying the timescale of speech using Independent Subspace Analysis (ISA). To carry out ISA, the single channel mixture signal is converted to a timefrequency representation such as spectrogram. Here, the spectrogram is generated by taking Hartley or Wavelet transform on overlapped frames of speech. We do dimensionality reduction of the autocorrelated original spectrogram using singular value decomposition. Then, we use Independent component analysis to get unmixing matrix using JadeICA algorithm. It is then assumed that the overall spectrogram results from the superposition of a number of unknown statistically independent spectrograms. By using unmixing matrix, independent sources such as temporal amplitude envelopes and frequency weights can be extracted from the spectrogram. Timescaling of speech is carried out by resampling the independent temporal amplitude envelopes. We then obtain timescaled independent spectrograms after multiplying the independent frequency weights with timescaled temporal amplitude envelopes. Summing all these independent spectrograms and taking inverse Hartely or wavelet transform of the sum spectrogram to reconstruct and overlapadd the reconstructed timedomain signal to get the timescaled speech. The quality of the timescaled speech has been analyzed using Modified Bark Spectral Distortion(MBSD). From the MBSD score, one can infer that the timescaled signal is less distorted.
Bibliographic reference. Muralishankar, R. / Ramakrishnan, A. G. / Kaushik, Lakshmish N. (2004): "Timescaling of speech using independent subspace analysis", In INTERSPEECH2004, 24652468.