Why mean over all actions sampled in multi outcome sampling #7

annw0922 · 2020-06-24T08:04:38Z

https://github.com/EricSteinberger/Deep-CFR/blob/master/DeepCFR/workers/la/sampling_algorithms/MultiOutcomeSampler.py

as 'aprx_imm_reg' here is computed for every action and put to buffer without being summed up, I have no idea why
'aprx_imm_reg *= legal_action_mask / n_actions_to_smpl '

I think it is because I could not understand the formula here(v~(I) = * p(a) * |A(I)), and I failed find corresponding part in your paper,
"""
Last state values are the average, not the sum of all samples of that state since we add
v~(I) = * p(a) * |A(I)|. Since we sample multiple actions on each traverser node, we have to average over
their returns like: v~(I) * Sum_a=0_N (v~(I|a) * p(a) * ||A(I)|| / N).
"""

is there any reference for it?

thanks a lot

EricSteinberger · 2020-11-10T22:30:44Z

Hi! This is to make sure that the estimate is not scaled up just because you sample more actions. The regrets get more accurate the more actions you sample but the expectation of the value should stay the same and not go up linearly. Does this make sense? It's not in the paper, you are right - thank you for checking before opening the issue, appreciated! This is an implementation detail and the paper itself doesn't use MOS sampling - it uses External sampling where this division doesn't really matter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why mean over all actions sampled in multi outcome sampling #7

Why mean over all actions sampled in multi outcome sampling #7

annw0922 commented Jun 24, 2020

EricSteinberger commented Nov 10, 2020

Why mean over all actions sampled in multi outcome sampling #7

Why mean over all actions sampled in multi outcome sampling #7

Comments

annw0922 commented Jun 24, 2020

EricSteinberger commented Nov 10, 2020