The problem of context recognition from mobile audio data is considered. We consider ten different audio contexts (such as car, bus, office and outdoors) prevalent in daily life situations. We choose mel-frequency cepstral coefficient (MFCC) parametrization and present an extensive comparison of six different classifiers: k-nearest neighbor (kNN), vector quantization (VQ), Gaussian mixture model trained with both maximum likelihood (GMM-ML) and maximum mutual information (GMM-MMI) criteria, GMM supervector support vector machine (GMM-SVM) and, finally, SVM with generalized linear discriminant sequence (GLDS-SVM). After all parameter optimizations, GMM-MMI and and VQ classifiers perform the best with 52.01 %, and 50.34 % context identification rates, respectively, using 3-second data records. Our analysis reveals further that none of the six classifiers is superior to each other when class-, user- or phone-specific accuracies are considered.
Cite as: Kinnunen, T., Saeidi, R., Leppänen, J., Saarinen, J.P. (2012) Audio context recognition in variable mobile environments from short segments using speaker and language recognizers. Proc. The Speaker and Language Recognition Workshop (Odyssey 2012), 304-311
@inproceedings{kinnunen12_odyssey, author={Tomi Kinnunen and Rahim Saeidi and Jussi Leppänen and Jukka P. Saarinen}, title={{Audio context recognition in variable mobile environments from short segments using speaker and language recognizers}}, year=2012, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2012)}, pages={304--311} }