ISCA Archive CHiME 2018
ISCA Archive CHiME 2018

Robust speaker diarization and recognition in naturalistic data streams: Challenges for multi-speaker tasks & learning spaces

John H L Hansen

Speech Technology is advancing beyond general speech recognition for voice command and telephone applications. Today, the emergence of many voice enabled speech systems have required the need for more effective distant based speech voice capture and automatic speech and speaker recognition. The ability to employ speech and language technology to assess human- to-human interactions is opening up new research paradigms which can have a profound impact on assessing human interaction including personal communication traits, and contribute to improving the quality of life and educational experience of individuals. In this talk, we will explore recent research trends on automatic audio diarization and speaker recognition for audio streams which include multi-tracks, speakers, and environments with distant based speech capture. Specifically, we will consider (i) Prof-Life-Log corpus, (ii) Education based child & student based Peer-Lead Team Learning, and (iii) Apollo-11 massive multi-track audio processing (19,000hrs of data). These domains in the context of CHIME workshops will be discussed in terms of algorithmic advancements, as well as directions for continued research.


Cite as: Hansen, J.H.L. (2018) Robust speaker diarization and recognition in naturalistic data streams: Challenges for multi-speaker tasks & learning spaces. Proc. 5th International Workshop on Speech Processing in Everyday Environments (CHiME 2018),

@inproceedings{hansen18_chime,
  author={John H L Hansen},
  title={{Robust speaker diarization and recognition in naturalistic data streams: Challenges for multi-speaker tasks & learning spaces}},
  year=2018,
  booktitle={Proc. 5th International Workshop on Speech Processing in Everyday Environments (CHiME 2018)},
  pages={}
}