Adding New Classes without Access to the Original Training Data with Applications to Language Identification

Hagai Taitelbaum, Ehud Ben-Reuven, Jacob Goldberger


In this study we address the problem of adding new classes to an existing neural network classifier. We assume that new training data with the new classes is available. In many applications, dataset used to train machine learning algorithms contain confidential information that cannot be accessed during the process of extending the class set. We propose a method for training an extended class-set classifier using only examples with labels from the new classes while avoiding the problem of forgetting the original classes. This incremental training method is applied to the problem of language identification. We report results on the 50 languages NIST 2015 dataset where we were able to classify all the languages even though only part of the classes was available during the first training phase and the other languages were only available during the second phase.


 DOI: 10.21437/Interspeech.2018-1342

Cite as: Taitelbaum, H., Ben-Reuven, E., Goldberger, J. (2018) Adding New Classes without Access to the Original Training Data with Applications to Language Identification. Proc. Interspeech 2018, 1808-1812, DOI: 10.21437/Interspeech.2018-1342.


@inproceedings{Taitelbaum2018,
  author={Hagai Taitelbaum and Ehud Ben-Reuven and Jacob Goldberger},
  title={Adding New Classes without Access to the Original Training Data with Applications to Language Identification},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1808--1812},
  doi={10.21437/Interspeech.2018-1342},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1342}
}