INTERSPEECH 2004 - ICSLP
We propose a new technique for modifying the time-scale of speech using Independent Subspace Analysis (ISA). To carry out ISA, the single channel mixture signal is converted to a time-frequency representation such as spectrogram. Here, the spectrogram is generated by taking Hartley or Wavelet transform on overlapped frames of speech. We do dimensionality reduction of the autocorrelated original spectrogram using singular value decomposition. Then, we use Independent component analysis to get unmixing matrix using JadeICA algorithm. It is then assumed that the overall spectrogram results from the superposition of a number of unknown statistically independent spectrograms. By using unmixing matrix, independent sources such as temporal amplitude envelopes and frequency weights can be extracted from the spectrogram. Timescaling of speech is carried out by resampling the independent temporal amplitude envelopes. We then obtain time-scaled independent spectrograms after multiplying the independent frequency weights with time-scaled temporal amplitude envelopes. Summing all these independent spectrograms and taking inverse Hartely or wavelet transform of the sum spectrogram to reconstruct and overlap-add the reconstructed time-domain signal to get the timescaled speech. The quality of the time-scaled speech has been analyzed using Modified Bark Spectral Distortion(MBSD). From the MBSD score, one can infer that the time-scaled signal is less distorted.
Bibliographic reference. Muralishankar, R. / Ramakrishnan, A. G. / Kaushik, Lakshmish N. (2004): "Time-scaling of speech using independent subspace analysis", In INTERSPEECH-2004, 2465-2468.