ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Discovering keywords from cross-modal input: ecological vs. engineering methods for enhancing acoustic repetitions

Guillaume Aimetti, Roger K. Moore, L. ten Bosch, Okko Johannes Räsänen, Unto Kalervo Laine

This paper introduces a computational model that automatically segments acoustic speech data and builds internal representations of keyword classes from cross-modal (acoustic and pseudo-visual) input. Acoustic segmentation is achieved using a novel dynamic time warping technique and the focus of this paper is on recent investigations conducted to enhance the identification of repeating portions of speech. This ongoing research is inspired by current cognitive views of early language acquisition and therefore strives for ecological plausibility in an attempt to build more robust speech recognition systems. Results show that an ad-hoc computationally engineered solution can aid the discovery of repeating acoustic patterns. However, we show that this improvement can be simulated in a more ecologically valid way.


doi: 10.21437/Interspeech.2009-340

Cite as: Aimetti, G., Moore, R.K., Bosch, L.t., Räsänen, O.J., Laine, U.K. (2009) Discovering keywords from cross-modal input: ecological vs. engineering methods for enhancing acoustic repetitions. Proc. Interspeech 2009, 1171-1174, doi: 10.21437/Interspeech.2009-340

@inproceedings{aimetti09_interspeech,
  author={Guillaume Aimetti and Roger K. Moore and L. ten Bosch and Okko Johannes Räsänen and Unto Kalervo Laine},
  title={{Discovering keywords from cross-modal input: ecological vs. engineering methods for enhancing acoustic repetitions}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={1171--1174},
  doi={10.21437/Interspeech.2009-340}
}