snakeAI

1. Features

A* agent

a method based on greedy policy
random search agent

a method based on reward and MCMC
DQN
DPG
PPO
SAC
Tanh-Norm

an approximation of RMS-Norm, with a better robust performance in off-policy learning

2. Tricks

initialize weight in uniform distribution U(-1, 1)
use RMSProp as optimizer
use layer-norm in on-policy methods
use weight-decay in on-policy methods
normalize gradient
the range of reward is symmetric and the value falls between -1 and 1

3. Reward

reward at position

float Agent::reward0(int xi, int yi, int xn, int yn, int xt, int yt)
{
    /* agent goes out of the map */
    if (map(xn, yn) == 1) {
        return -1;
    }
    /* agent reaches to the target's position */
    if (xn == xt && yn == yt) {
        return 1;
    }
    /* the distance from agent's previous position to the target's position */
    float d1 = (xi - xt) * (xi - xt) + (yi - yt) * (yi - yt);
    /* the distance from agent's current position to the target's position */
    float d2 = (xn - xt) * (xn - xt) + (yn - yt) * (yn - yt);
    return std::sqrt(d1) - std::sqrt(d2);
}

cumulative reward per epoch

DQN reward

the reward agent received will be decreased when agent gets closer to the target until it reaches to the target's position. Otherwise, the agent will tend to be overconfident.

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
snakeAI		snakeAI
PolicyGradient.pdf		PolicyGradient.pdf
README.md		README.md
reward.png		reward.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

snakeAI

1. Features

2. Tricks

3. Reward

About

Releases

Packages

Languages

WorldEditor50/snakeAI

Folders and files

Latest commit

History

Repository files navigation

snakeAI

1. Features

2. Tricks

3. Reward

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages