Skip to content

C++ Robust And Approximate Markov decision processes

License

Notifications You must be signed in to change notification settings

UNH-Robotics/CRAAM

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Robust And Approximate Markov decision processes

Craam is a C++ library for solving plain, robust, or optimistic Markov decision processes. The library also provides basic tools that enable simulation and construction of MDPs from samples. There is also support for state aggregation and abstraction solution methods.

The library supports standard finite or infinite horizon discounted MDPs [Puterman2005]. Some basic stochazstic shortest path methods are also supported. The library assumes maximization over actions. The states and actions must be finite.

The robust model extends the regular MDPs [Iyengar2005]. The library allows to model uncertainty in both the transitions and rewards, unlike some published papers on this topic. This is modeled by adding an outcome to each action. The outcome is assumed to be minimized by nature, similar to [Filar1997].

In summary, the MDP problem being solved is:

v(s) = \max_{a \in \mathcal{A}} \min_{o \in \mathcal{O}} \sum_{s\in\mathcal{S}} ( r(s,a,o,s') + \gamma P(s,a,o,s') v(s') ) ~.

Here, \mathcal{S} are the states, \mathcal{A} are the actions, \mathcal{O} are the outcomes.

Available algorithms are value iteration and modified policy iteration. The library support both the plain worst-case outcome method and a worst case with respect to a base distribution.

Installation

The library has minimal dependencies and should compile on all major operating systems.

Minimal Requirements

  • CMake 3.1.0
  • C++14 compatible compiler
    • Tested with Linux GCC 4.9.2,5.2.0,6.1.0; does not work with GCC 4.7, 4.8.
    • Tested with Linux Clang 3.6.2 (and maybe 3.2+).
  • Boost to enable unit tests and for some simple numerical algebra

Python Interface Dependencies

  • Python 3.3+ (Python 2 is NOT supported)
  • Setuptools 7.0
  • Numpy 1.8+
  • Cython 0.21+

Optional Dependencies

  • OpenMP to enable parallel computation
  • Doxygen 1.8.0+ to generate documentation

Build Instructions

Build all default supported targets:

$ cmake .
$ cmake --build .

Run unit tests

Note that Boost must be present in order to build the tests in the first place.

$ cmake .
$ cmake --build . --target testit

Build a benchmark executable

To run a benchmark problem, download and decompress one of the following test files:

These two benchmark problems were generated randomly.

The small benchmark example, for example, can be executed as follows:

$ cmake --build . --target benchmark
$ mkdir data
$ cd data
$ wget https://www.dropbox.com/s/b9x8sz7q5ow1vm4/ss.zip
$ unzip ss.zip
$ cd ..
$ bin/benchmark data/smallsize_test.csv

Development

CMake can generate project files for a variety of IDE's. For more see:

$ cmake --help

Getting Started

The main interface to the library is through the class RMDP. The class supports simple construction of an MDP and several methods for solving them.

States, actions, and outcomes are identified using 0-based contiguous indexes. The actions are indexed independently for each states and the outcomes are indexed independently for each state and action pair.

Transitions are added through functions RMDP::add_transition and RMDP::add_transition_d. The object is automatically resized according to the new transitions added. The actual algorithms are solved using:

The following is a simple example of formulating and solving a small MDP.

#include "RMDP.hpp"
#include "modeltools.hpp"

#include <iostream>
#include <vector>

using namespace craam;

int main(){
    MDP mdp(3);

    // transitions for action 0
    add_transition(mdp,0,0,0,1,0);
    add_transition(mdp,1,0,0,1,1);
    add_transition(mdp,2,0,1,1,1);

    // transitions for action 1
    add_transition(mdp,0,1,1,1,0);
    add_transition(mdp,1,1,2,1,0);
    add_transition(mdp,2,1,2,1,1.1);

    // solve using Jacobi value iteration
    auto&& re = mdp.mpi_jac(Uncertainty::Average,0.9);

    for(auto v : re.valuefunction){
        cout << v << " ";
    }

    return 0;
}

To compile the file, run:

$ g++ -std=c++14 -I<path_to_RAAM.h> -L . simple.cpp -lcraam

Documentation

The documentation can be generated using doxygen; the configuration file and the documentation are in the doc directory.

General Assumptions

  • Transition probabilities must be non-negative but do not need to add up to a specific value
  • Transitions with 0 probabilities may be omitted, except there must be at least one target state in each transition
  • State with no actions: A terminal state with value 0
  • Action with no outcomes: Terminates with an error
  • Outcome with no target states: Terminates with an error

Common Use Cases

  1. Formulate an uncertain MDP
  2. Compute a solution to an uncertain MDP
  3. Compute value of a fixed policy
  4. Compute an occupancy frequency
  5. Simulate transitions of an MDP
  6. Construct MDP from samples
  7. Simulate a general domain

References

[Filar1997]Filar, J., & Vrieze, K. (1997). Competitive Markov decision processes. Springer.
[Puterman2005]Puterman, M. L. (2005). Markov decision processes: Discrete stochastic dynamic programming. Handbooks in operations research and management …. John Wiley & Sons, Inc.
[Iyengar2005]Iyengar, G. N. G. (2005). Robust dynamic programming. Mathematics of Operations Research, 30(2), 1–29.
[Petrik2014]Petrik, M., Subramanian S. (2014). RAAM : The benefits of robustness in approximating aggregated MDPs in reinforcement learning. In Neural Information Processing Systems (NIPS).
[Petrik2016]Petrik, M., & Luss, R. (2016). Interpretable Policies for Dynamic Product Recommendations. In Uncertainty in Artificial Intelligence (UAI).

About

C++ Robust And Approximate Markov decision processes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 72.4%
  • Python 21.6%
  • Batchfile 2.2%
  • Makefile 2.1%
  • CMake 1.7%