diff --git a/CONTRIBUTE.md b/.github/CONTRIBUTING.md
similarity index 100%
rename from CONTRIBUTE.md
rename to .github/CONTRIBUTING.md
diff --git a/README.md b/README.md
index 25ce637..ea67c81 100644
--- a/README.md
+++ b/README.md
@@ -1,37 +1,69 @@
-# gentun: distributed genetic algorithm for hyperparameter tuning
+
+
+
+
+
+
+
+
gentun
+
+ Python package for distributed genetic algorithm-based hyperparameter tuning
+
+
[![PyPI](https://img.shields.io/pypi/v/gentun)](https://pypi.org/project/gentun/)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/gentun)](https://pypi.org/project/gentun/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/gentun)](https://pypi.org/project/gentun/)
[![PyPI - License](https://img.shields.io/pypi/l/gentun)](https://pypi.org/project/gentun/)
+
+
+ Table of Contents
+
+ - About The Project
+ - Installation
+ -
+ Usage
+
+
+ - Supported Models
+ - Contributing
+ - References
+
+
+
+## About The Project
+
The goal of this project is to create a simple framework
for [hyperparameter](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)) tuning of machine learning models,
like Neural Networks and Gradient Boosting Trees, using a genetic algorithm. Evaluating the fitness of an individual in
-a population involves training a model with a specific set of hyperparameters, which is a time-consuming process. To
-address this problem, we provide a controller-worker . Multiple workers can handle model training and cross-validation
-of individuals provided by a controller while this controller manages the generation of offspring through reproduction
-and mutation.
+a population requires training a model with a specific set of hyperparameters, which is a time-consuming task. To
+address this issue, we offer a controller-worker system: multiple workers can perform model training and
+cross-validation of individuals provided by a controller while this controller manages the generation of offspring
+through reproduction and mutation.
*"Parameter tuning is a dark art in machine learning, the optimal parameters of a model can depend on many scenarios."*
~ [XGBoost tutorial](https://xgboost.readthedocs.io/en/stable/tutorials/param_tuning.html) on Parameter Tuning
-*"[...] The number of possible network structures increases exponentially with the number of layers in the network,
-which inspires us to adopt the genetic algorithm to efficiently traverse this large search space."* ~
-[Genetic CNN](https://arxiv.org/abs/1703.01513) paper
-
-- [Installation](#installation)
-- [Usage](#usage)
- - [Single node](#single-node)
- - [Pre-defined individuals](#adding-pre-defined-individuals)
- - [Grid search](#performing-a-grid-search)
- - [Multiple nodes](#multiple-nodes)
- - [Redis setup](#redis-setup)
- - [Controller](#controller-node)
- - [Workers](#worker-nodes)
-- [Supported models](#supported-models)
-- [Contributing](#contributing)
-- [References](#references)
+*"The number of possible network structures increases exponentially with the number of layers in the network, which
+inspires us to adopt the genetic algorithm to efficiently traverse this large search space."* ~
+[Genetic CNN paper](https://arxiv.org/abs/1703.01513)
## Installation
@@ -39,6 +71,12 @@ which inspires us to adopt the genetic algorithm to efficiently traverse this la
pip install gentun
```
+Some model handlers require additional libraries. You can also install their dependencies with:
+
+```bash
+pip install "gentun[xgboost]" # or "gentun[tensorflow]"
+```
+
To setup a development environment, run:
```bash
@@ -49,7 +87,7 @@ flit install --deps develop --extras tensorflow,xgboost
## Usage
-### Single node
+### Single Node
The most basic way to run the algorithm is using a single machine, as shown in the following example where we use it to
find the optimal hyperparameters of an [`xgboost`](https://xgboost.readthedocs.io/en/stable/) model. First, we download
@@ -114,7 +152,7 @@ follows this convention. Nonetheless, to make the framework more flexible, you c
`algorithm.run()` to override this behavior and minimize your fitness metric (e.g. when you want to minimize the loss,
for example *rmse* or *binary crossentropy*).
-#### Adding pre-defined individuals
+#### Adding Pre-defined Individuals
Oftentimes, it's convenient to initialize the genetic algorithm with some known individuals instead of a random
population. You can add custom individuals to the population before running the genetic algorithm if you already have
@@ -137,7 +175,7 @@ population = Population(genes, XGBoostCV, 49, x_train, y_train, **kwargs)
population.add_individual(hyperparams)
```
-#### Performing a grid search
+#### Performing a Grid Search
Grid search is also widely used for hyperparameter optimization. This framework provides `gentun.populations.Grid`,
which can be used to conduct a grid search over a single generation pass. You must use genes which define the `sample()`
@@ -164,7 +202,7 @@ population = Grid(genes, XGBoostCV, gene_samples, x_train, y_train, **kwargs)
Running the genetic algorithm on this population for just one generation is equivalent to doing a grid search over 10
`learning_rate` values, all `max_depth` values between 3 and 10, and all `min_child_weight` values between 0 and 10.
-### Multiple nodes
+### Multiple Nodes
You can speed up the genetic algorithm by using several machines to evaluate individuals in parallel. One of node has to
act as a *controller*, generating populations and running the genetic algorithm. Each time this *controller* node needs
@@ -172,7 +210,7 @@ to evaluate an individual from a population, it will send a request to a job que
receive the model's hyperparameters and perform model fitting through k-fold cross-validation. The more *workers* you
run, the faster the algorithm will evolve each generation.
-#### Redis setup
+#### Redis Setup
The simplest way to start the Redis service that will host the communication queues is through `docker`:
@@ -180,7 +218,7 @@ The simplest way to start the Redis service that will host the communication que
docker run -d --rm --name gentun-redis -p 6379:6379 redis
```
-#### Controller node
+#### Controller Node
To run the distributed genetic algorithm, define a `gentun.services.RedisController` and pass it to the `Population`
instead of the `x_train` and `y_train` data. When the algorithm needs to evaluate the fittest individual, it will pass
@@ -198,7 +236,7 @@ population = Population(genes, XGBoostCV, 100, controller=controller, **kwargs)
# ... run algorithm
```
-#### Worker nodes
+#### Worker Nodes
The worker nodes are defined using the `gentun.services.RedisWorker` class and passing the handler to it. Then, we use
its `run()` method with train data to begin processing jobs from the queue. You can use as many nodes as desired as long
@@ -214,7 +252,7 @@ worker = RedisWorker("experiment", XGBoostCV, host="localhost", port=6379)
worker.run(x_train, y_train)
```
-## Supported models
+## Supported Models
This project supports hyperparameter tuning for the following models:
@@ -235,17 +273,17 @@ Our roadmap includes:
You can also help us speed up hyperparameter search by contributing your spare GPU time.
-For more details on how to contribute, please check our [contribution guide](./CONTRIBUTE.md).
+For more details on how to contribute, please check our [contribution guidelines](.github/CONTRIBUTING.md).
## References
-### Genetic algorithms
+### Genetic Algorithms
* Artificial Intelligence: A Modern Approach. 3rd edition. Section 4.1.4
* https://github.com/DEAP/deap
* http://www.theprojectspot.com/tutorial-post/creating-a-genetic-algorithm-for-beginners/3
-### XGBoost parameter tuning
+### XGBoost Parameter Tuning
* http://xgboost.readthedocs.io/en/latest/parameter.html
* http://xgboost.readthedocs.io/en/latest/how_to/param_tuning.html
diff --git a/assets/icon.png b/assets/icon.png
new file mode 100644
index 0000000..c0b2009
Binary files /dev/null and b/assets/icon.png differ
diff --git a/src/gentun/__init__.py b/src/gentun/__init__.py
index 742d54a..54de4e9 100644
--- a/src/gentun/__init__.py
+++ b/src/gentun/__init__.py
@@ -1,6 +1,7 @@
"""
gentun - Distributed Genetic Algorithm for Hyperparameter Tuning
"""
+
from .config import setup_logging
__version__ = "0.0.1"
diff --git a/src/gentun/config.py b/src/gentun/config.py
index b3961e9..c8a0044 100644
--- a/src/gentun/config.py
+++ b/src/gentun/config.py
@@ -1,6 +1,7 @@
"""
logging configurations
"""
+
import logging
import sys
diff --git a/src/gentun/models/tensorflow.py b/src/gentun/models/tensorflow.py
index ba70ae4..8c6f340 100644
--- a/src/gentun/models/tensorflow.py
+++ b/src/gentun/models/tensorflow.py
@@ -1,6 +1,7 @@
"""
Models implemented in tensorflow
"""
+
import logging
import os
from typing import Any, Sequence, Union