Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks

Yun Wang, Juncheng Li, Florian Metze


Many sequence learning tasks require the localization of certain events in sequences. Because it can be expensive to obtain strong labeling that specifies the starting and ending times of the events, modern systems are often trained with weak labeling without explicit timing information. Multiple instance learning (MIL) is a popular framework for learning from weak labeling. In a common scenario of MIL, it is necessary to choose a pooling function to aggregate the predictions for the individual steps of the sequences. In this paper, we compare the "max" and "noisy-or" pooling functions on a speech recognition task and a sound event detection task. We find that max pooling is able to localize phonemes and sound events, while noisy-or pooling fails. We provide a theoretical explanation of the different behavior of the two pooling functions on sequence learning tasks.


 DOI: 10.21437/Interspeech.2018-990

Cite as: Wang, Y., Li, J., Metze, F. (2018) Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks. Proc. Interspeech 2018, 1339-1343, DOI: 10.21437/Interspeech.2018-990.


@inproceedings{Wang2018,
  author={Yun Wang and Juncheng Li and Florian Metze},
  title={Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1339--1343},
  doi={10.21437/Interspeech.2018-990},
  url={http://dx.doi.org/10.21437/Interspeech.2018-990}
}