Random Carracing | GA Carracing |
---|---|
Random BipedalWalker-v2 | GA BipedalWalker-v2 | Small BipedalWalker-v2* |
---|---|---|
* -- see more here
Random Cartpole-v0 | GA Cartpole-v0 |
---|---|
GA SlimeVolley-v0 |
---|
The Slime Volleyball environment turned out to be quite hard for proposed method. The interesting fact to observe is that trained agent (on the right) tries to imitate movement of its opponent. |
The project aims to train neural networks using genetic algorithms. Instead of minimalizing the cost function using common optimizers such as: SGD or Adam the simple GA was used. The algorithm was trying to alter the weights and biases of the neural network to achieve the best score of the fitness function. The profound description of genetic algorithms as well as the used environments are shown below.
Create virtual environment (python3 -m venv name-of-venv && source name-of-venv/bin/activate
) and install all necessary packages (pip install -r requirements.txt
).
The pretrained models are located in models/{name_of_environment}
directory. For example to check how the model performs in BipedalWalker environment, specify name of the
pretrained model in tests/bipedalwalker/testing_model_bipedalwalker.py
script (also one have to adjust the model architecture) and run that script.
Due to my limited computing resources, in training neural models I have used Spell platform (I really recommend it for smaller project)
spell run "python scripts/spell/bipedalwalker_mlp_spell.py" --pip-req requirements.txt
For Carracing environment (need to open virtual display server)
spell run --apt python-dev --apt cmake --apt zlib1g-dev --apt libjpeg-dev \
--apt xvfb --apt ffmpeg --apt xorg-dev --apt python-opengl --apt libboost-all-dev \
--apt libsdl2-dev --apt swig \
"Xvfb :1 -screen 0 1024x768x24 -ac +extension GLX +render -noreset &> xvfb.log ; export DISPLAY=:1 ; python scripts/spell/carracing_conv_spell.py" \
--pip-req requirements.txt
The question arises, since the main objective of neural network is to minimalize the cost function, why can't we use the genetic algorithms instead of SGD or Adam optimizer? The approach, used in this project, assumes application of simple GA in order to train the neural networks in the multiple OpenAI Gym environments.
The figure on the left hand-side depicts the steps in order to turn the weights and biases of the neural network into the genotype. Firstly, the matrices are flattened and the biases vectors are concatenated to them. After one episode of the genetic algorithm, the genotype (vector) is converted into the input neural network architecture (the figure on the right hand-side).
The vast number of genetic algorithms are constructed using 3 major operations: selection, crossover and mutation. In those experiments I checked many different types of the mentioned algorithms.
- Setup -- As in many various generic algorithms at the beginning the population of individuals (neural networks) are created. The weights and biases of them are initialized randomly.
- Selection -- For each individual the fitness function are calculated. I tested both the ranking and the roulette wheel selection and the former method worked significantly better. As the result, the two individuals are selected with the highest fitness score.
- Crossover -- The 'parents' (two neural networks) are decomposed into the flat vectors and then the simple or the BLX-alpha crossover is performed.
- Mutation -- Each made child has a chance to mutate i.e., alter the weights or biases. I found out that bigger mutation (uniform distribution with large range) is vital to achieve greater results.
- Repeat -- If the sum of fitness score of the created children is greater than theirs parents, the children go to the next generation. The process starts from the beginning (step 2.) until the number of the generations reached the limit or the fitness score was satisfied.
The calculation of the fitness function varies, it depends on the environment. However, the basic example to compute is as follows:
env = gym.make("your environment")
def compute_fitness_function(env, model, n_episodes: int):
obs = env.reset()
fitness = 0
for n in n_episodes:
action = model(obs)
obs, reward, done, info = env.step(action)
fitness += reward
if done:
break
return fitness
Mean value of the fitness function for BipedalWalker-v2 problem |
The fitness function in this example is non monotonic and the variance for the generation above 1000 is sizeable. These situations were common during training.
Can we reduce the sizes of the neural network while keeping good performance in the particular environment? The first step was to gather the training data for the smaller (student) model i.e. I took the best model (from BipedalWalker environment) and ran it multiple times whilst saving both the features (observations from environment) and the labels (models' actions). The initial neural model architecture was shrinkage significantly, from 24-20-12-12-4
to 4-4-4
, interestingly, preserving the ability to complete the task.
Example scheme for training the agent in two environments | Example model architecture for multitask learning |
I used Hard parameters sharing approach to train both the CartPole and BipedalWalker agents. The figures above depict how the neural model architecture looks like.
I also checked how the trained model will behave, if I fed it with different input values, that is:
- Both agents receive correct observations from environments
- CartPole agent receives random noise from uniform distribution, while BipedalWalker agent receives correct observations
- CartPole agent receives correct observations, while BipedalWalker agent receives random noise from uniform distribution
- Both agents receive random noise from uniform distribution.
The results are shown below. To conclude, the both model inputs are important, but not equally. It is clearly visible that for 2. option, despite noised observations in CartPole, the BipedalWalker agent performs reasonable well.
1. BipedalWalker | 1. CartPole |
2. BipedalWalker | 2. CartPole |
3. BipedalWalker | 3.CartPole |
4. BipedalWalker | 4. CartPole |