You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 11, 2021. It is now read-only.
Once a move is played, we have to search it "for real". The policy priors for the move are the ones that come from whatever random rotation was used to open the node during search.
So, before beginning tree search, evaluate the root with all 8 symmetries and average the policy together.
This could be put behind a flag and directly tested for rating effect.
The text was updated successfully, but these errors were encountered:
This observation applies at all nodes, not just the root node. And the root node is the most likely to have its policy prior overridden, since it accumulates all of the subsequent evaluation data. I think it might make sense for the oscillating playouts, since there you have fewer reads and it depends heavily on what the initial policy priors are. (this is doubly so with -1 value prior which we know tends to make the MCTS go very deep on the first move).
Yes, it applies to all nodes, but doing 8x reads is (generally) better than averaging all of them. With the inference cache, since it's close to free to rebuild the tree, it's easy to drop the tree, get a 'truer' top-level policy, and rebuild. This goes double for PCO, since we only drop the tree before a 'full readout'.
We've added a symmetry-aware inference cache that averages all symmetries for a position, which (assuming an infinite cache) will tend towards returning the average of all 8 symmetries. For example, when running 9x9 selfplay with a 32GB cache, we end up with a cache hit-rate of >60%. This strongly indicates that nearly all of the games end up looking like each other. So if we did something like the suggestion, we'd also have to add more noise to the root, or possibly to every inference.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Once a move is played, we have to search it "for real". The policy priors for the move are the ones that come from whatever random rotation was used to open the node during search.
So, before beginning tree search, evaluate the root with all 8 symmetries and average the policy together.
This could be put behind a flag and directly tested for rating effect.
The text was updated successfully, but these errors were encountered: