In the context of our advanced AI topics course, I recorded an introductory tutorial on Learning from Positive and Unlabeled Data, which can be found at https://dtai.cs.kuleuven.be/pulearning.
The goal of the tutorial is to introduce you to the PU learning field and get an overview of the main assumptions and classes of techniques in the field. The tutorial consists of 6 videos about the following topics:
- PU Learning and its sources
- PU Learning definitions
- Assumptions to enable PU Learning
- Two-step techniques
- Biased learning
- Incorporation of the labeling mechanism
The tutorial largely follows our survey paper (Bekker & Davis, 2020).
-
Learning from positive and unlabeled data: a survey
Jessa Bekker,
and Jesse Davis
Machine Learning
2020
[Bib]
[Abs]
[arXiv]
[PDF]
[Video]
@article{bekker2020mlj,
author = {Bekker, Jessa and Davis, Jesse},
title = {{L}earning from positive and unlabeled data: a survey},
journal = {{M}achine {L}earning},
volume = {109},
number = {4},
pages = {719--760},
eprint = {4},
month = may,
year = {2020},
doi = {10.1007/s10994-020-05877-5},
url = {https://lirias.kuleuven.be/3028883}
}
Learning from positive and unlabeled data or PU learning is the setting where a learner only has access to positive examples and unlabeled data. The assumption is that the unlabeled data can contain both positive and negative examples. This setting has attracted increasing interest within the machine learning literature as this type of data naturally arises in applications such as medical diagnosis and knowledge base completion. This article provides a survey of the current state of the art in PU learning. It proposes seven key research questions that commonly arise in this field and provides a broad overview of how the field has tried to address them.