diff --git a/CONTRIBUTE.md b/.github/CONTRIBUTING.md similarity index 100% rename from CONTRIBUTE.md rename to .github/CONTRIBUTING.md diff --git a/README.md b/README.md index 25ce637..ea67c81 100644 --- a/README.md +++ b/README.md @@ -1,37 +1,69 @@ -# gentun: distributed genetic algorithm for hyperparameter tuning + + +
+
+ + plugin-icon + +

gentun

+

+ Python package for distributed genetic algorithm-based hyperparameter tuning +

+
[![PyPI](https://img.shields.io/pypi/v/gentun)](https://pypi.org/project/gentun/) [![PyPI - Downloads](https://img.shields.io/pypi/dm/gentun)](https://pypi.org/project/gentun/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/gentun)](https://pypi.org/project/gentun/) [![PyPI - License](https://img.shields.io/pypi/l/gentun)](https://pypi.org/project/gentun/) + +
+ Table of Contents +
    +
  1. About The Project
  2. +
  3. Installation
  4. +
  5. + Usage + +
  6. +
  7. Supported Models
  8. +
  9. Contributing
  10. +
  11. References
  12. +
+
+ +## About The Project + The goal of this project is to create a simple framework for [hyperparameter](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)) tuning of machine learning models, like Neural Networks and Gradient Boosting Trees, using a genetic algorithm. Evaluating the fitness of an individual in -a population involves training a model with a specific set of hyperparameters, which is a time-consuming process. To -address this problem, we provide a controller-worker . Multiple workers can handle model training and cross-validation -of individuals provided by a controller while this controller manages the generation of offspring through reproduction -and mutation. +a population requires training a model with a specific set of hyperparameters, which is a time-consuming task. To +address this issue, we offer a controller-worker system: multiple workers can perform model training and +cross-validation of individuals provided by a controller while this controller manages the generation of offspring +through reproduction and mutation. *"Parameter tuning is a dark art in machine learning, the optimal parameters of a model can depend on many scenarios."* ~ [XGBoost tutorial](https://xgboost.readthedocs.io/en/stable/tutorials/param_tuning.html) on Parameter Tuning -*"[...] The number of possible network structures increases exponentially with the number of layers in the network, -which inspires us to adopt the genetic algorithm to efficiently traverse this large search space."* ~ -[Genetic CNN](https://arxiv.org/abs/1703.01513) paper - -- [Installation](#installation) -- [Usage](#usage) - - [Single node](#single-node) - - [Pre-defined individuals](#adding-pre-defined-individuals) - - [Grid search](#performing-a-grid-search) - - [Multiple nodes](#multiple-nodes) - - [Redis setup](#redis-setup) - - [Controller](#controller-node) - - [Workers](#worker-nodes) -- [Supported models](#supported-models) -- [Contributing](#contributing) -- [References](#references) +*"The number of possible network structures increases exponentially with the number of layers in the network, which +inspires us to adopt the genetic algorithm to efficiently traverse this large search space."* ~ +[Genetic CNN paper](https://arxiv.org/abs/1703.01513) ## Installation @@ -39,6 +71,12 @@ which inspires us to adopt the genetic algorithm to efficiently traverse this la pip install gentun ``` +Some model handlers require additional libraries. You can also install their dependencies with: + +```bash +pip install "gentun[xgboost]" # or "gentun[tensorflow]" +``` + To setup a development environment, run: ```bash @@ -49,7 +87,7 @@ flit install --deps develop --extras tensorflow,xgboost ## Usage -### Single node +### Single Node The most basic way to run the algorithm is using a single machine, as shown in the following example where we use it to find the optimal hyperparameters of an [`xgboost`](https://xgboost.readthedocs.io/en/stable/) model. First, we download @@ -114,7 +152,7 @@ follows this convention. Nonetheless, to make the framework more flexible, you c `algorithm.run()` to override this behavior and minimize your fitness metric (e.g. when you want to minimize the loss, for example *rmse* or *binary crossentropy*). -#### Adding pre-defined individuals +#### Adding Pre-defined Individuals Oftentimes, it's convenient to initialize the genetic algorithm with some known individuals instead of a random population. You can add custom individuals to the population before running the genetic algorithm if you already have @@ -137,7 +175,7 @@ population = Population(genes, XGBoostCV, 49, x_train, y_train, **kwargs) population.add_individual(hyperparams) ``` -#### Performing a grid search +#### Performing a Grid Search Grid search is also widely used for hyperparameter optimization. This framework provides `gentun.populations.Grid`, which can be used to conduct a grid search over a single generation pass. You must use genes which define the `sample()` @@ -164,7 +202,7 @@ population = Grid(genes, XGBoostCV, gene_samples, x_train, y_train, **kwargs) Running the genetic algorithm on this population for just one generation is equivalent to doing a grid search over 10 `learning_rate` values, all `max_depth` values between 3 and 10, and all `min_child_weight` values between 0 and 10. -### Multiple nodes +### Multiple Nodes You can speed up the genetic algorithm by using several machines to evaluate individuals in parallel. One of node has to act as a *controller*, generating populations and running the genetic algorithm. Each time this *controller* node needs @@ -172,7 +210,7 @@ to evaluate an individual from a population, it will send a request to a job que receive the model's hyperparameters and perform model fitting through k-fold cross-validation. The more *workers* you run, the faster the algorithm will evolve each generation. -#### Redis setup +#### Redis Setup The simplest way to start the Redis service that will host the communication queues is through `docker`: @@ -180,7 +218,7 @@ The simplest way to start the Redis service that will host the communication que docker run -d --rm --name gentun-redis -p 6379:6379 redis ``` -#### Controller node +#### Controller Node To run the distributed genetic algorithm, define a `gentun.services.RedisController` and pass it to the `Population` instead of the `x_train` and `y_train` data. When the algorithm needs to evaluate the fittest individual, it will pass @@ -198,7 +236,7 @@ population = Population(genes, XGBoostCV, 100, controller=controller, **kwargs) # ... run algorithm ``` -#### Worker nodes +#### Worker Nodes The worker nodes are defined using the `gentun.services.RedisWorker` class and passing the handler to it. Then, we use its `run()` method with train data to begin processing jobs from the queue. You can use as many nodes as desired as long @@ -214,7 +252,7 @@ worker = RedisWorker("experiment", XGBoostCV, host="localhost", port=6379) worker.run(x_train, y_train) ``` -## Supported models +## Supported Models This project supports hyperparameter tuning for the following models: @@ -235,17 +273,17 @@ Our roadmap includes: You can also help us speed up hyperparameter search by contributing your spare GPU time. -For more details on how to contribute, please check our [contribution guide](./CONTRIBUTE.md). +For more details on how to contribute, please check our [contribution guidelines](.github/CONTRIBUTING.md). ## References -### Genetic algorithms +### Genetic Algorithms * Artificial Intelligence: A Modern Approach. 3rd edition. Section 4.1.4 * https://github.com/DEAP/deap * http://www.theprojectspot.com/tutorial-post/creating-a-genetic-algorithm-for-beginners/3 -### XGBoost parameter tuning +### XGBoost Parameter Tuning * http://xgboost.readthedocs.io/en/latest/parameter.html * http://xgboost.readthedocs.io/en/latest/how_to/param_tuning.html diff --git a/assets/icon.png b/assets/icon.png new file mode 100644 index 0000000..c0b2009 Binary files /dev/null and b/assets/icon.png differ diff --git a/src/gentun/__init__.py b/src/gentun/__init__.py index 742d54a..54de4e9 100644 --- a/src/gentun/__init__.py +++ b/src/gentun/__init__.py @@ -1,6 +1,7 @@ """ gentun - Distributed Genetic Algorithm for Hyperparameter Tuning """ + from .config import setup_logging __version__ = "0.0.1" diff --git a/src/gentun/config.py b/src/gentun/config.py index b3961e9..c8a0044 100644 --- a/src/gentun/config.py +++ b/src/gentun/config.py @@ -1,6 +1,7 @@ """ logging configurations """ + import logging import sys diff --git a/src/gentun/models/tensorflow.py b/src/gentun/models/tensorflow.py index ba70ae4..8c6f340 100644 --- a/src/gentun/models/tensorflow.py +++ b/src/gentun/models/tensorflow.py @@ -1,6 +1,7 @@ """ Models implemented in tensorflow """ + import logging import os from typing import Any, Sequence, Union