This project aims to use deep reinforcement learning (DRL) to play Snake game automatically. The core DRL method used here is PPO for discrete, which has brilliant performance in the field of discrete action space like in continuous action space. You just need half an hour to train the snake agent and then it can take effect.
conda create -n py311 python=3.11 -y
conda activate py311
pip install -r requirements.txt
python train.py # after training, the training curve of current round will autometically show
python snake.py # evaluate latest saved model
python eval.py # --weight ./model/act-weight_round3_472_82.5.pkl
python plot.py # --history ./logs/reward_round3_82.5.csv
Round | 1 | 2 | 3 |
---|---|---|---|
Traing curve | |||
Evaluation | |||
Reward_eat | +2.0 | +2.0 | +2.0 |
Reward_hit | -0.5 | -1.0 | -1.5 |
Reward_bit | -0.8 | -1.5 | -2.0 |
Avg record | ≈19 | ≈23 | ≈28 |
- Increasing the penalty for death leads to higher average records
- The training result of the low death penalty strategy has a low reward curve, but it performs well in the demo
- A particularly high reward for eating food can lead to quick success regardless of long-term safety
- Training time is too short to reflect the advantages of DRL compared to none-DRL method (Snaqe)
- The zigzag of snake body looks ugly, try to add punishment into reward for too many zigzags