15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Simultaneous Gender Classification and Voice Activity Detection Using Deep Neural Networks

Hiroshi Fujimura

Toshiba, Japan

This paper proposes a novel technique for simultaneously executing gender classification and voice activity detection (VAD) using Deep Neural Networks (DNNs). Speaker information such as gender is important in some speech recognition applications such as recommendation systems and trend analysis. Usually, gender classification is applied after speech segments are detected by VAD. In previous studies, gender classification and VAD are separately considered. For the past few years, DNN has been applied to both of them as a powerful classifier. However huge calculation cost is needed if two separate DNNs are used for them. In our method, a single DNN classifies each frame into male, female, and silence classes. The frame-based classification results are used for both gender classification and VAD. For VAD, the sum of male and female posterior probabilities from the DNN is used as voice posterior probability. Gender classification is also carried out based on the results of the DNN classifier. Experimental results show that the proposed method achieves high accuracy for both gender classification and VAD.

Full Paper

Bibliographic reference.  Fujimura, Hiroshi (2014): "Simultaneous gender classification and voice activity detection using deep neural networks", In INTERSPEECH-2014, 1139-1143.