It is common for subjects to produce speech while performing a physical task where speech technology may be used. Variabilities are introduced to speech since physical task can influence human speech production. These variabilities degrade the performance of most speech systems. It is vital to detect speech under physical stress variabilities for subsequent algorithm processing. This study presents a method for detecting physical task stress from speech. Inspired by the fact that i-vectors can generally model total factors from speech, a state-of-the-art i-vector framework is investigated with MFCCs and our previously formulated TEO-CB-Auto-Env features for neutral/physical task stress detection. Since MFCCs are derived from a linear speech production model and TEO-CB-Auto-Env features employ a nonlinear operator, these two features are believed to have complementary effects on physical task stress detection. Two alternative fusion strategies (feature-level and score-level fusion) are investigated to validate this hypothesis. Experiments over the UT-Scope Physical Corpus demonstrate that a relative accuracy gain of 2.68% is obtained when fusing different feature based i-vectors. An additional relative performance boost with of 6.52% in accuracy is achieved using score level fusion.
Bibliographic reference. Zhang, Chunlei / Liu, Gang / Yu, Chengzhu / Hansen, John H. L. (2015): "I-vector based physical task stress detection with different fusion strategies", In INTERSPEECH-2015, 2689-2693.