10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Model-Based Speech Separation: Identifying Transcription Using Orthogonality

S. W. Lee (1), Frank K. Soong (2), Tan Lee (1)

(1) Chinese University of Hong Kong, China
(2) Microsoft Research Asia, China

Spectral envelopes and harmonics are the building elements of a speech signal. By estimating these elements, individual speech sources in a mixture observation can be reconstructed and hence separated. Transcription gives the spoken content. More important, it describes the expected sequence of spectral envelopes, if modeling of different speech sounds is acquired. Our recently proposed single-microphone speech separation algorithm exploits this to derive the spectral envelope trajectories of individual sources and remove interference accordingly. The correctness of such transcription becomes critical to the separation performance. This paper investigates the relationship between the correctness of transcription hypotheses and the orthogonality of associated source estimates. An orthogonality measure is introduced to quantify the correlation between spectrograms. Experiments verify that underlying true transcriptions lead to a salient orthogonality distribution, which is distinguishable from the counterfeit transcription one. Accordingly a transcription identification technique is developed, which succeeds in identifying true transcriptions in 99.74% of the experimental trials.

Full Paper

Bibliographic reference.  Lee, S. W. / Soong, Frank K. / Lee, Tan (2009): "Model-based speech separation: identifying transcription using orthogonality", In INTERSPEECH-2009, 1343-1346.