When speech is corrupted by other sound sources certain spectrotemporal regions will be dominated by speech energy and others by the noise. Listeners are able to exploit these cues to achieve robust speech perception in adverse conditions. Inspired by this perception process a 'speech fragment decoding' technique has shown promising robustness when handling multiple sound sources. This paper proposes an approach to estimating 'speechiness' - a degree of confidence that a spectro-temporal region is dominated by speech energy - using the modulation spectrogram. This additional knowledge is employed to steer the decoder towards selecting more reliable speech evidence in noise. Experiments show that the speechiness measure is capable of improving recognition accuracies in various noise conditions at 0 dB global signal-to-noise ratio.
Bibliographic reference. Ma, Ning / Green, Phil (2008): "A 'speechiness' measure to improve speech decoding in the presence of other sound sources", In INTERSPEECH-2008, 1285-1288.