Speech Activity Detection (SAD) is a well researched problem for communication, command and control applications, where audio segments are short duration and solution proposed for noisy as well as clean environments. In this study, we investigate the SAD problem using NASA's Apollo space mission data . Unlike traditional speech corpora, the audio recordings in Apollo are extensive from a longitudinal perspective (i.e., 612 days each). From SAD perspective, the data offers many challenges: (i) noise distortion with variable SNR, (ii) channel distortion, and (iii) extended periods of non-speech activity. Here, we use the recently proposed Combo-SAD, which has performed remarkably well in DARPA RATS evaluations, as our baseline system . Our analysis reveals that the Combo-SAD performs well when speech-pause durations are balanced in the audio segment, but deteriorates significantly when speech is sparse or absent. In order to mitigate this problem, we propose a simple yet efficient technique which builds an alternative model of speech using data from a separate corpora, and embeds this new information within the Combo-SAD framework. Our experiments show that the proposed approach has a major impact on SAD performance (i.e., +30% absolute), especially in audio segments that contain sparse or no speech information.
Bibliographic reference. Ziaei, Ali / Kaushik, Lakshmish / Sangwan, Abhijeet / Hansen, John H. L. / Oard, Douglas W. (2014): "Speech activity detection for NASA apollo space missions: challenges and solutions", In INTERSPEECH-2014, 1544-1548.