Room occupancy estimation technology has been shown to reduce building energy cost significantly. However speech-based occupancy estimation has not been well explored. In this paper, we investigate energy mode and babble speaker count methods for estimating both small and large crowds in a party-mode room setting. We also examine how distance between speakers and microphone affects their estimation accuracies. Then we propose a novel entropy-based method, which is invariant to different speakers and their different positions in a room. Evaluations on synthetic crowd speech generated using the TIMIT corpus show that acoustic volume features are less affected by distance, and our proposed method outperforms existing methods across a range of different conditions.
Cite as: Chen, S., Epps, J., Ambikairajah, E., Le, P.N. (2017) An Investigation of Crowd Speech for Room Occupancy Estimation. Proc. Interspeech 2017, 324-328, doi: 10.21437/Interspeech.2017-70
@inproceedings{chen17c_interspeech, author={Siyuan Chen and Julien Epps and Eliathamby Ambikairajah and Phu Ngoc Le}, title={{An Investigation of Crowd Speech for Room Occupancy Estimation}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={324--328}, doi={10.21437/Interspeech.2017-70} }