The ability to count the number of words spoken by an individual over long durations is important to researchers investigating language development, healthcare, education, etc. In this study, we attempt to build a speech system that can compute daily word counts using data from the Prof-Life-Log corpus. The task is challenging as typical audio files from Prof-Life-Log tend to be 8-to-16 hours long, where audio is collected continuously using the LENA device. This device is worn by the primary speaker and all his daily interactions are collected in fine detail. The recordings contain a wide variety of noise types with varying SNR (signal-to-noise ratio) including large crowd, babble, and competing secondary speakers. In this study, we develop a word-count estimation (WCE) system based on syllable detection and we use the method proposed by Wang and Narayanan as the baseline system . We propose many modifications to the original algorithm to improve its effectiveness in noise. Particularly, we incorporate speech activity detection and enhancement techniques to remove non-speech from analysis and improve signal quality for superior syllable detection, respectively. We also investigate features derived from syllable detection for better word count estimation. The proposed method show significant improvement over the baseline.
Bibliographic reference. Ziaei, Ali / Sangwan, Abhijeet / Hansen, John H. L. (2014): "A speech system for estimating daily word counts", In INTERSPEECH-2014, 880-884.