You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This paper uses pre-prepared algorithms to guide a reinforcement learning agent and speed up its start up. https://arxiv.org/pdf/2008.12001.pdf
We're currently giving the neural network a warm-start by training directly with stockfish information, but this may produce training data that's too different from the MCTS priors. If we instead use stockfish "advice" within MCTS it could produce more similar data and thus lead to more efficient training once we start training with only MCTS. @JuddBE also proposed we could do 3 phases: start with our current method, then use stockfish as a trainer, then move on to only MCTS.
Implementing would depend on how exactly we want to use stockfish advice. Most likely the main changes would be having MCTS take value supplier and priors supplier functions as parameters, and then implementing those functions as needed for the stockfish trainer.
The text was updated successfully, but these errors were encountered:
This paper uses pre-prepared algorithms to guide a reinforcement learning agent and speed up its start up.
https://arxiv.org/pdf/2008.12001.pdf
We're currently giving the neural network a warm-start by training directly with stockfish information, but this may produce training data that's too different from the MCTS priors. If we instead use stockfish "advice" within MCTS it could produce more similar data and thus lead to more efficient training once we start training with only MCTS. @JuddBE also proposed we could do 3 phases: start with our current method, then use stockfish as a trainer, then move on to only MCTS.
Implementing would depend on how exactly we want to use stockfish advice. Most likely the main changes would be having MCTS take value supplier and priors supplier functions as parameters, and then implementing those functions as needed for the stockfish trainer.
The text was updated successfully, but these errors were encountered: