Skip to content

dataiku-research/OpenAL

Repository files navigation

Active Learning Benchmark

This repository is the official implementation of anonymous 2022

How to run

The file main_run.py contains the necessary code to re-run the benchmark.

Datasets characteristics

1461 - bank-marketing

https://www.openml.org/search?type=data&status=active&id=1461

Bank Marketing The data is related with direct marketing campaigns of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).

number of instances 45211 number of classes 2 number of features 17 number of numeric features 7

Model used : Random Forest

1471 - eeg-eye-state

https://www.openml.org/search?type=data&status=active&id=1471

All data is from one continuous EEG measurement with the Emotiv EEG Neuroheadset. The eye state was detected via a camera during the EEG measurement and added later manually to the file after analyzing the video frames. '1' indicates the eye-closed and '0' the eye-open state.

number of instances 14980 number of features 15 number of classes 2

Model used : Multi Layer Perceptron

1502 - skin-segmentation

https://www.openml.org/search?type=data&status=active&id=1502

number of instances 245057 number of features 4 number of classes 2

Model used : Random Forest

1590 - adult

https://www.openml.org/search?type=data&status=active&id=1590 Prediction task is to determine whether a person makes over 50K a year.

number of instances 48842 number of features 15 number of classes 2

Model used : Gradient Boosting Classifier

40922 - Run_or_walk_information

https://www.openml.org/search?type=data&status=active&id=40922 "0": walking "1": running

number of instances 88588 number of features 7 number of classes 2

Model used : Random Forest

41138 - APSFailure

https://www.openml.org/search?type=data&status=active&id=41138

number of instances 76000 number of features 171 number of classes 2

Model used : Gradient Boosting Classifier

41162 - kick

https://www.openml.org/search?type=data&status=active&id=41162

predict if the car purchased at the Auction is a Kick (if the vehicle have serious issues that prevent it from being sold to customers)

number of instances 72983 number of features 33 number of classes 2

Model used : Gradient Boosting Classifier

42395 - SantanderCustomerSatisfaction

https://www.openml.org/search?type=data&status=active&id=42395 binary classification problems such as: is a customer satisfied?

number of instances 200000 number of features 202 number of classes 2

Model used : Gradient Boosting Classifier

42803 - road-safety

https://www.openml.org/search?type=data&status=active&id=42803

predict sex of the driver in road accidents

number of instances 363243 number of features 67 number of classes 3

Model used : Gradient Boosting Classifier

43439 - Medical-Appointment-No-Shows

https://www.openml.org/search?type=data&status=active&id=43439 What if that possible to predict someone to no-show an appointment?

number of instances 110527 number of features 13 number of classes 2

Model used : Gradient Boosting Classifier

43551 - Employee-Turnover-at-TECHCO

https://www.openml.org/search?type=data&status=active&id=43551

number of instances 34452 number of features 10 number of classes 2

Model used : Gradient Boosting Classifier

MNIST

https://www.tensorflow.org/datasets/catalog/mnist

number of instances 60000 number of features 28 x 28 number of classes 10

Model used : Multi Layer Perceptron

CIFAR10

https://www.tensorflow.org/datasets/catalog/cifar10

number of instances 60000 number of features 2048 number of classes 10

Model used : Multi Layer Perceptron

Embeddings

We take the embeddings generated by a ResNet50 pretrained on ImageNet

Embeddings trained with contrastive learning

We use embeddings generated with contrastive learning. See https://github.com/google-research/simclr

CIFAR100

https://www.tensorflow.org/datasets/catalog/cifar100

number of instances 60000 number of features 2048 number of classes 100

Model used : Multi Layer Perceptron

Embeddings

We take the embeddings generated by a ResNet50 pretrained on ImageNet

Embeddings trained with contrastive learning

We use embeddings generated with contrastive learning. See https://github.com/google-research/simclr

About

Benchmarking active learning on tabular datasets

Resources

License

Stars

Watchers

Forks

Packages

No packages published