Training deep belief networks (DBNs) is normally done with large data sets. Our goal is to predict traces of the surface of the tongue in ultrasound images of hu- man speech. Hand-tracing is labor-intensive; the dataset is highly imbalanced since many images are extremely similar. We propose a bootstrapping method which han- dles this imbalance by iteratively selecting a small subset of images to be handtraced (thereby reducing human la- bor time), then (re)training the DBN, making use of an entropy-based diversity measure for the initial selection, thereby achieving over a two-fold reduction in human time required for tracing with human-level accuracy.
Index Terms: deep belief networks, ultrasound imaging, tongue imaging, speech processing, bootstrapping, class imbalance problem
Bibliographic reference. Berry, Jeff / Fasel, Ian / Fadiga, Luciano / Archangeli, Diana (2012): "Training deep nets with imbalanced and unlabeled data", In INTERSPEECH-2012, 1756-1759.