Spectral envelopes and harmonics are the building elements of a speech signal. By estimating these elements, individual speech sources in a mixture observation can be reconstructed and hence separated. Transcription gives the spoken content. More important, it describes the expected sequence of spectral envelopes, if modeling of different speech sounds is acquired. Our recently proposed single-microphone speech separation algorithm exploits this to derive the spectral envelope trajectories of individual sources and remove interference accordingly. The correctness of such transcription becomes critical to the separation performance. This paper investigates the relationship between the correctness of transcription hypotheses and the orthogonality of associated source estimates. An orthogonality measure is introduced to quantify the correlation between spectrograms. Experiments verify that underlying true transcriptions lead to a salient orthogonality distribution, which is distinguishable from the counterfeit transcription one. Accordingly a transcription identification technique is developed, which succeeds in identifying true transcriptions in 99.74% of the experimental trials.
Bibliographic reference. Lee, S. W. / Soong, Frank K. / Lee, Tan (2009): "Model-based speech separation: identifying transcription using orthogonality", In INTERSPEECH-2009, 1343-1346.