Release v5.0.0 · joshuaspear/offline_rl_ope

Correctly implemented per-decision weighted importance sampling
Expanded the different types of weights that can be implemented based on:
- http://proceedings.mlr.press/v48/jiang16.pdf: Per-decision weights are defined as the average weight at a given timepoint. This results in a different denominator for different timepoints. This is implemented with the following WISWeightNorm(avg_denom=True)
- https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1079&context=cs_faculty_pubs: Per-decision weights are defined as the sum of discounted weights across all timesteps. This is implemented with the following WISWeightNorm(discount=discount_value)
- Combinations of different weights can be easily implemented for example 'average discounted weights' WISWeightNorm(discount=discount_value, avg_denom=True) however, these do not necessaily have backing from literature.
EffectiveSampleSize metric optinally returns nan if all weights are 0
Bug fixes:
- Fix bug when running on cuda where tensors were not being pushed to CPU
- Improved static typing

Provide feedback