DeepPipe: Deep Pipeline Embeddings for AutoML

DeepPipe efficiently optimizes Machine Learning Pipelines using meta-learning. For detailed information, refer to our paper Deep Pipeline Embeddings for AutoML accepted at KDD 2023. Additionally, you can visit our blog-post to have a friendly insight on how our method works.

Installation

We present an API for optimizing pipelines in scikit-learn based on the TensorOboe search space. You can use it to search for accurate pipelines or for benchmarking your Machine Learning model on tabular data.

conda create -n deeppipe_env python==3.9
conda activate deeppipe_env
pip install deeppipe_api==0.1.4

Getting started

We present an example using an OpenML dataset. However, it works with any tabular data typed as pandas dataframe.

from deeppipe_api.deeppipe import load_data, openml, DeepPipe

task_id = 37
task = openml.tasks.get_task(task_id)
X_train, X_test, y_train, y_test = load_data(task, fold=0)
deep_pipe = DeepPipe(n_iters = 50,  #bo iterations
                    time_limit = 3600 #in seconds
                    )
deep_pipe.fit(X_train, y_train)
y_pred = deep_pipe.predict(X_test)

#Test
score = deep_pipe.score(X_test, y_test)
print("Test acc.:", score)

#print best pipeline
print(deep_pipe.model)

Note: When comparing with other AutoML optimizers have in mind that the search space might differ.

Ensemble of Pipelines

It is possible to ensemble the best pipelines, by using a greedy approach.

from deeppipe_api.deeppipe import load_data, openml, DeepPipe

task = openml.tasks.get_task(task_id=37)
X_train, X_test, y_train, y_test = load_data(task, fold=0)
deep_pipe = DeepPipe(n_iters = 50,  #bo iterations
                    time_limit = 3600, #in seconds
                    create_ensemble = False,
                    ensemble_size = 10,
                    )
deep_pipe.fit(X_train, y_train)
y_pred = deep_pipe.predict(X_test)
score = deep_pipe.score(X_test, y_test)
print("Test acc.:", score)

Collab Notebook

You can try running DeepPipe in this colab notebook.

Advanced Usage

For meta-training DeepPipe or testing other search spaces, you can refer to the folder src/deeppipe_api/experiments/.

Our Paper

If you use this repository/package, please cite our paper:

@inproceedings{pineda2023_deeppipe,
author = {Pineda Arango, Sebastian and Grabocka, Josif},
title = {Deep Pipeline Embeddings for AutoML},
year = {2023},
isbn = {9798400701030},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3580305.3599303},
doi = {10.1145/3580305.3599303},
booktitle = {Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
pages = {1907–1919},
numpages = {13},
location = {Long Beach, CA, USA},
series = {KDD '23}
}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
dist		dist
figures		figures
src/deeppipe_api		src/deeppipe_api
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepPipe: Deep Pipeline Embeddings for AutoML

Installation

Getting started

Ensemble of Pipelines

Collab Notebook

Advanced Usage

Our Paper

About

Releases

Packages

Languages

License

machinelearningnuremberg/DeepPipe

Folders and files

Latest commit

History

Repository files navigation

DeepPipe: Deep Pipeline Embeddings for AutoML

Installation

Getting started

Ensemble of Pipelines

Collab Notebook

Advanced Usage

Our Paper

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages