Transformers trained with reinforce and V-MPO
Run tests/gtrxl_test.py
with --state_rep=gtrxl
Run tests/gtrxl_test.py
with --state_rep=coberl
to run the architecture with CoBERL in the place of GTrXL in architecture
To run the GTrXL as a policy-estimating network, run pg.py
with bool Trans=True
within. To evaluate the common MLP estimation, set this bool Trans=False
.