Alternatives to MCTS? #245

beppeben · 2021-06-20T15:41:54Z

beppeben
Jun 20, 2021

This is probably a trivial question.
While reading about MCTS for the first time I was wondering if one could make a self-play learner that is simpler than that.
The thing that would come most natural to me would be to simply learn a network V(s, a) that outputs the expected reward of taking action a on board s.
During each step of self play, one would sample an action with probability proportional to V(s, a) by evaluating the network over all possible action values (with some temperature parameter for better exploration). The loss function would then just be the mean of [V(s_t, a_t)-z_t]^2, over all sampled states s_t, chosen actions a_t and obtained rewards z_t.
In this way the network would be smaller (one output value instead of a full distribution), the loss function is simpler and there are no auxiliary variables (like Q or N).
I suppose I'm missing something otherwise someone would have done it already. Anyone knows why it doesn't work?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternatives to MCTS? #245

{{title}}

Replies: 0 comments

Select a reply

Alternatives to MCTS? #245

beppeben Jun 20, 2021

Replies: 0 comments

beppeben
Jun 20, 2021