Skip to content

Commit

Permalink
docs: Enhance repo documentation (#49)
Browse files Browse the repository at this point in the history
- [x] Add README header and icon
- [x] Make collapsible table of content
- [x] Fix some wording problems in README
- [x] Move CONTRIBUTE.md to .github/
  • Loading branch information
gmontamat authored Sep 12, 2024
1 parent 29859c0 commit ebe0c4b
Show file tree
Hide file tree
Showing 6 changed files with 73 additions and 32 deletions.
File renamed without changes.
102 changes: 70 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,82 @@
# gentun: distributed genetic algorithm for hyperparameter tuning
<a name="readme-top"></a>

<br />
<div align="center">
<a href="https://github.com/gmontamat/gentun">
<img alt="plugin-icon" src="assets/icon.png">
</a>
<h1 style="margin: 0;" align="center">gentun</h1>
<p>
Python package for distributed genetic algorithm-based hyperparameter tuning
</p>
</div>

[![PyPI](https://img.shields.io/pypi/v/gentun)](https://pypi.org/project/gentun/)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/gentun)](https://pypi.org/project/gentun/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/gentun)](https://pypi.org/project/gentun/)
[![PyPI - License](https://img.shields.io/pypi/l/gentun)](https://pypi.org/project/gentun/)

<!-- TABLE OF CONTENTS -->
<details>
<summary>Table of Contents</summary>
<ol>
<li><a href="#about-the-project">About The Project</a></li>
<li><a href="#installation">Installation</a></li>
<li>
<a href="#usage">Usage</a>
<ul>
<li>
<a href="#single-node">Single Node</a>
<ul>
<li><a href="#adding-pre-defined-individuals">Adding Pre-defined Individuals</a></li>
<li><a href="#performing-a-grid-search">Performing a Grid Search</a></li>
</ul>
</li>
<li>
<a href="#multiple-nodes">Multiple Nodes</a>
<ul>
<li><a href="#redis-setup">Redis Setup</a></li>
<li><a href="#controller-node">Controller Node</a></li>
<li><a href="#worker-nodes">Worker Nodes</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#supported-models">Supported Models</a></li>
<li><a href="#contributing">Contributing</a></li>
<li><a href="#references">References</a></li>
</ol>
</details>

## About The Project

The goal of this project is to create a simple framework
for [hyperparameter](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)) tuning of machine learning models,
like Neural Networks and Gradient Boosting Trees, using a genetic algorithm. Evaluating the fitness of an individual in
a population involves training a model with a specific set of hyperparameters, which is a time-consuming process. To
address this problem, we provide a controller-worker . Multiple workers can handle model training and cross-validation
of individuals provided by a controller while this controller manages the generation of offspring through reproduction
and mutation.
a population requires training a model with a specific set of hyperparameters, which is a time-consuming task. To
address this issue, we offer a controller-worker system: multiple workers can perform model training and
cross-validation of individuals provided by a controller while this controller manages the generation of offspring
through reproduction and mutation.

*"Parameter tuning is a dark art in machine learning, the optimal parameters of a model can depend on many scenarios."*
~ [XGBoost tutorial](https://xgboost.readthedocs.io/en/stable/tutorials/param_tuning.html) on Parameter Tuning

*"[...] The number of possible network structures increases exponentially with the number of layers in the network,
which inspires us to adopt the genetic algorithm to efficiently traverse this large search space."* ~
[Genetic CNN](https://arxiv.org/abs/1703.01513) paper

- [Installation](#installation)
- [Usage](#usage)
- [Single node](#single-node)
- [Pre-defined individuals](#adding-pre-defined-individuals)
- [Grid search](#performing-a-grid-search)
- [Multiple nodes](#multiple-nodes)
- [Redis setup](#redis-setup)
- [Controller](#controller-node)
- [Workers](#worker-nodes)
- [Supported models](#supported-models)
- [Contributing](#contributing)
- [References](#references)
*"The number of possible network structures increases exponentially with the number of layers in the network, which
inspires us to adopt the genetic algorithm to efficiently traverse this large search space."* ~
[Genetic CNN paper](https://arxiv.org/abs/1703.01513)

## Installation

```bash
pip install gentun
```

Some model handlers require additional libraries. You can also install their dependencies with:

```bash
pip install "gentun[xgboost]" # or "gentun[tensorflow]"
```

To setup a development environment, run:

```bash
Expand All @@ -49,7 +87,7 @@ flit install --deps develop --extras tensorflow,xgboost

## Usage

### Single node
### Single Node

The most basic way to run the algorithm is using a single machine, as shown in the following example where we use it to
find the optimal hyperparameters of an [`xgboost`](https://xgboost.readthedocs.io/en/stable/) model. First, we download
Expand Down Expand Up @@ -114,7 +152,7 @@ follows this convention. Nonetheless, to make the framework more flexible, you c
`algorithm.run()` to override this behavior and minimize your fitness metric (e.g. when you want to minimize the loss,
for example *rmse* or *binary crossentropy*).

#### Adding pre-defined individuals
#### Adding Pre-defined Individuals

Oftentimes, it's convenient to initialize the genetic algorithm with some known individuals instead of a random
population. You can add custom individuals to the population before running the genetic algorithm if you already have
Expand All @@ -137,7 +175,7 @@ population = Population(genes, XGBoostCV, 49, x_train, y_train, **kwargs)
population.add_individual(hyperparams)
```

#### Performing a grid search
#### Performing a Grid Search

Grid search is also widely used for hyperparameter optimization. This framework provides `gentun.populations.Grid`,
which can be used to conduct a grid search over a single generation pass. You must use genes which define the `sample()`
Expand All @@ -164,23 +202,23 @@ population = Grid(genes, XGBoostCV, gene_samples, x_train, y_train, **kwargs)
Running the genetic algorithm on this population for just one generation is equivalent to doing a grid search over 10
`learning_rate` values, all `max_depth` values between 3 and 10, and all `min_child_weight` values between 0 and 10.

### Multiple nodes
### Multiple Nodes

You can speed up the genetic algorithm by using several machines to evaluate individuals in parallel. One of node has to
act as a *controller*, generating populations and running the genetic algorithm. Each time this *controller* node needs
to evaluate an individual from a population, it will send a request to a job queue that is processed by *workers* which
receive the model's hyperparameters and perform model fitting through k-fold cross-validation. The more *workers* you
run, the faster the algorithm will evolve each generation.

#### Redis setup
#### Redis Setup

The simplest way to start the Redis service that will host the communication queues is through `docker`:

```shell
docker run -d --rm --name gentun-redis -p 6379:6379 redis
```

#### Controller node
#### Controller Node

To run the distributed genetic algorithm, define a `gentun.services.RedisController` and pass it to the `Population`
instead of the `x_train` and `y_train` data. When the algorithm needs to evaluate the fittest individual, it will pass
Expand All @@ -198,7 +236,7 @@ population = Population(genes, XGBoostCV, 100, controller=controller, **kwargs)
# ... run algorithm
```

#### Worker nodes
#### Worker Nodes

The worker nodes are defined using the `gentun.services.RedisWorker` class and passing the handler to it. Then, we use
its `run()` method with train data to begin processing jobs from the queue. You can use as many nodes as desired as long
Expand All @@ -214,7 +252,7 @@ worker = RedisWorker("experiment", XGBoostCV, host="localhost", port=6379)
worker.run(x_train, y_train)
```

## Supported models
## Supported Models

This project supports hyperparameter tuning for the following models:

Expand All @@ -235,17 +273,17 @@ Our roadmap includes:

You can also help us speed up hyperparameter search by contributing your spare GPU time.

For more details on how to contribute, please check our [contribution guide](./CONTRIBUTE.md).
For more details on how to contribute, please check our [contribution guidelines](.github/CONTRIBUTING.md).

## References

### Genetic algorithms
### Genetic Algorithms

* Artificial Intelligence: A Modern Approach. 3rd edition. Section 4.1.4
* https://github.com/DEAP/deap
* http://www.theprojectspot.com/tutorial-post/creating-a-genetic-algorithm-for-beginners/3

### XGBoost parameter tuning
### XGBoost Parameter Tuning

* http://xgboost.readthedocs.io/en/latest/parameter.html
* http://xgboost.readthedocs.io/en/latest/how_to/param_tuning.html
Expand Down
Binary file added assets/icon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions src/gentun/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""
gentun - Distributed Genetic Algorithm for Hyperparameter Tuning
"""

from .config import setup_logging

__version__ = "0.0.1"
Expand Down
1 change: 1 addition & 0 deletions src/gentun/config.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""
logging configurations
"""

import logging
import sys

Expand Down
1 change: 1 addition & 0 deletions src/gentun/models/tensorflow.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""
Models implemented in tensorflow
"""

import logging
import os
from typing import Any, Sequence, Union
Expand Down

0 comments on commit ebe0c4b

Please sign in to comment.