Repository for the corresponding full-paper accepted at the LIDTA-2022 workshop of the ECML/PKDD 2022.
Note: This repository initially served as the code repo for my thesis in Master of Artificial Intelligence programme at KU Leuven but it was later modified/extended to accommodate the relevant content of the LIDTA-2022 full-paper submission.
The code provides an experimental evaluation of how the structure of the validation set, i.e., its size and label bias, impacts the performance of different CASH search strategies within the context of anomaly detection.
data
directory contains a sub-directory of theoriginal
datasets used in the experiments, while theprocessed
sub-directory is created bysrc/notebooks/dataset_preprocessor.ipynb
notebook.src
directory contains the core implemntation code comprised of python scripts and notebooks. It also contains theauto-sklearn
package which is modified to accommodate unsupervised anomaly detection tasks.results
directory contains the raw results of the paper for the different CASH search spaces.
Provide the experiment parameters in src/config.json
:
datasets
: list of datasetsiterations
: list of iterations, i.e. different versions of the train/test splits (1, 2, ..., 10)classifiers
: list of anomaly detectorssearch_space
: version of search space (sp1, sp2 or default)validation_set_split_strategies
: list of strategies to split the validation set (stratified, balanced)validation_set_sizes
: list of sizes for the validation set (20, 50, 100, 200)total_budget
: total duration of a single searchper_run_budget
: minimum duration of a single run and runauto_ad_main.py
.
Name | Description | Link |
---|---|---|
Auto-Sklearn | Automated machine learning toolkit | 🔗 |
PyOD | Python library for anomaly detection | 🔗 |
Datasets | Anomaly detection datasets | 🔗 |
Copyright © 2022 Ioannis Antoniadis