Fold 2D HP chains using reinforcement learning. A work in progress.
Currently depends on HPSandbox.
This implementation is currently limited to two-dimensional chains. The size of the grid may be set by the user.
A residue in a chain may execute one of the following moves: (i) an end-move or (ii) crank-shaft.
Find a policy
Consider a simple HP chain with sequence HHPHH.
Generate a configuration for the chain (relies on HPSandbox) --
cd examples
sh make_config.sh
cd ..
Next, sample all configurations that are accessible via the move set
python config_generator.py --chain_length 5 --grid_size 7 --conf_dir examples --draw True
which should print out 13. A file containing the configurations is saved in the default directory, here /examples/5, along with configuration images and a movie.
Finally, compute the optimal policy using value iteration
python iteration.py -s HHPHH --draw True --verbose True