Skip to content

A baseline implementation of genetic programming (using trees to encode programs) with some examples of usage.

License

Notifications You must be signed in to change notification settings

marcovirgolin/genepro

Repository files navigation

genepro

art of a juniper, 'ginepro' in Italian

Art of a juniper, "ginepro" in Italian, made with the genetic drawing repo by @anopara.

In brief

genepro is a Python library providing a baseline implementation of genetic programming, an evolutionary algorithm specialized to evolve programs. This library includes a classifier and regressor that are compatible with scitik-learn (see examples of usage below).

Evolving programs are represented as trees. The leaf nodes (also called terminals) of such trees represent some form of input, e.g., a feature for classification or regression, or a type of environmental observation for reinforcement learning. The internal nodes represent possible atomic instructions, e.g., summation, subtraction, multiplication, division, but also if-then-else or similar programming constructs.

Genetic programming operates on a population of trees, typically initialized at random. Every iteration (called generation), promising trees undergo random modifications (e.g., forms of crossover, mutation, and tuning) that result in a population of offspring trees. This new population is then used for the next generation.

animation of genepro finding a symbolic regression solution

Example of 1D symbolic regression (made with this gist)

Installation

For classification or regression, genepro relies only on a few libraries (numpy, joblib, and scikit-learn). However, additional libraries (e.g., gym) are required to run the reinforcement learning example. Thus, you can choose to perform a minimal or full installation.

Minimal installation

To perform a minimal installation, run:

pip install genepro

Full installation

For a full installation, clone this repo locally, and make use of the file requirements.txt, as follows:

git clone https://github.com/marcovirgolin/genepro
cd genepro
pip install -r requirements.txt .

Wish to use conda?

A conda virtual enviroment can easily be set up with:

git clone https://github.com/marcovirgolin/genepro
cd genepro
conda env create
conda activate genepro
pip install .

Examples of usage

Classification and regression

The notebook classification and regression.ipynb shows how to use genepro for classification and regression, via scikit-learn estimators.

These estimators are intended for data sets with a small number of (relevant) features, as the evolved program can be written as a compact (and potentially interpretable) symbolic expression.

...
gen: 39,	best of gen fitness: -2952.999,	best of gen size: 46
gen: 40,	best of gen fitness: -2950.453,	best of gen size: 44
The mean squared error on the test set is 2964.646 (respective R^2 score is 0.512)
Obtained by the (simplified) model: 146.527 + -5.797*(-x_2**2 - 4*x_2 - 3*x_3 + 2*x_4 - x_5 - x_6*(x_4 - x_5) + x_6 - 5*x_8)

Example of output of a symbolic regression model discovered for the Diabetes data set.

Reinforcement learning

The notebook gym.ipynb shows how genepro can be used to evolve a controller for the CartPole-v1 environment of the OpenAI gym library.

animation displaying a random cart pole controller

Left: random cart pole controller; Right: evolved symbolic cart pole controller:

(x2 + x3) * (x2*x3 + x3 + x4 + 1) * log(abs(x2))^2 * log(abs(x3))^2 < 0.5? 'left' else 'right'

Citation

If you use this software, please cite it with:

@software{Virgolin_genepro_2022,
  author = {Virgolin, Marco},
  month = {9},
  title = {{genepro}},
  url = {https://github.com/marcovirgolin/genepro},
  version = {0.1.3},
  year = {2024}
}