13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Prof-Life-Log: Audio Environment Detection for Naturalistic Audio Streams

Ali Ziaei, Abhijeet Sangwan, John H. L. Hansen

Center for Robust Speech Systems (CRSS), Department of Electrical Engineering, University of Texas at Dallas, Richardson, TX, USA

In this study, we develop a new system for real world audio environment matching. Environment detection within unknown audio streams requires a system that operates in an unsupervised manner since it will be faced with unknown environments with- out prior information. In addition, the overall solution should be computationally efficient for large audio collection. In the pro- posed approach, a Gaussian mixture model(GMM) is trained on large amounts of unlabeled audio data and used as a back- ground acoustic model. Subsequently, an acoustic signature vector (ASV) is computed for each environment. Here, the ASV vector is designed to capture the unique acoustic characteristics of an environment. Using the ASV vectors, we demonstrate that it is possible to compute an effective similarity measure between two acoustic environments. We demonstrate the per- formance of the proposed system on real-world audio data, and compare it to a traditional GMM-UBM (Universal Background Model) system. Experiments show that our system achieves an equal error rate (EER) that is +35% better than a baseline GMM-UBM system.

Index Terms: Audio Environment Detection, Acoustic Signature, Real word audio data, Prof-Life-Log

Full Paper

Bibliographic reference.  Ziaei, Ali / Sangwan, Abhijeet / Hansen, John H. L. (2012): "Prof-life-log: audio environment detection for naturalistic audio streams", In INTERSPEECH-2012, 2514-2517.