Multi-Talker Speech Recognition Based on Blind Source Separation with ad hoc Microphone Array Using Smartphones and Cloud Storage

Keiko Ochi, Nobutaka Ono, Shigeki Miyabe, Shoji Makino


In this paper, we present a multi-talker speech recognition system based on blind source separation with an ad hoc microphone array, which consists of smartphones and cloud storage. In this system, a mixture of voices from multiple speakers is recorded by each speaker’s smartphone, which is automatically transferred to online cloud storage. Our prototype system is realized using iPhone and Dropbox. Although the signals recorded by different iPhones are not synchronized, the blind synchronization technique compensates both the differences in the time offset and the sampling frequency mismatch. Then, auxiliary-function-based independent vector analysis separates the synchronized mixture into each speaker’s voice. Finally, automatic speech recognition is applied to transcribe the speech. By experimental evaluation of the multi-talker speech recognition system using Julius, we confirm that it effectively reduces the speech overlap and improves the speech recognition performance.


DOI: 10.21437/Interspeech.2016-758

Cite as

Ochi, K., Ono, N., Miyabe, S., Makino, S. (2016) Multi-Talker Speech Recognition Based on Blind Source Separation with ad hoc Microphone Array Using Smartphones and Cloud Storage. Proc. Interspeech 2016, 3369-3373.

Bibtex
@inproceedings{Ochi+2016,
author={Keiko Ochi and Nobutaka Ono and Shigeki Miyabe and Shoji Makino},
title={Multi-Talker Speech Recognition Based on Blind Source Separation with ad hoc Microphone Array Using Smartphones and Cloud Storage},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-758},
url={http://dx.doi.org/10.21437/Interspeech.2016-758},
pages={3369--3373}
}