Author: Jihwan Jeong
In this directory, we provide symbolic toolkits that you can use with pyRDDLGym. Specifically, we provide the following:
- Symbolic compilation of CPFs (conditional probability functions) in XADD (eXtended Algebraic Decision Diagram)
- Dynamic Bayes Net (DBN) visualization, allowing dependence analysis
- Symbolic Dynamic Programming (SDP):
- RDDL simulation with the symbolically compiled model
To assist your understanding and ease the process of getting started with the pyRDDLGym-symbolic
library, we have prepared a Google Colab notebook. This notebook includes a detailed installation procedure as well as simple, illustrative examples to help you quickly grasp how to use the library effectively.
Follow the steps below to access and use the Google Colab notebook:
-
Access the Notebook: Click on the following link to open the Google Colab notebook: .
-
Follow the Instructions: The notebook contains step-by-step instructions for installing necessary packages to use with the
pyRDDLGym-symbolic
library and running simple examples. -
Feedback and Questions: If you encounter any issues or have questions regarding the
pyRDDLGym-symbolic
library or the Google Colab notebook, please feel free to open an issue on the GitHub repository.
Happy coding!
# Create and activate a new conda environment
conda create -n symbolic python # Note: Python 3.12 won't work with Gurobi 10.
conda activate symbolic
# Install the xaddpy >= 0.2.5
pip install xaddpy
# Install pyRDDLGym >= 2.0
pip install pyrddlgym
# (Optional) install rddlrepository-v2
pip install rddlrepository
# (Optional) Install Gurobipy (make sure you have a license)
python -m pip install gurobipy
# Finally, install pyRDDLGym_symbolic
cd ~/path/to/pyRDDLGym_symbolic
pip install -e .
# Or simply, install from pypi
pip install pyrddlgym-symbolic
In addition, you'll need to install pygraphviz
to visualize XADDs in the graph format.
- For Ubuntu/Debian users, run the following command.
sudo apt-get install graphviz graphviz-dev
- For Fedora and Red Hat systems, you can do as follows.
sudo dnf install graphviz graphviz-devel
- For Mac users, you can use
brew
to installgraphviz
.
brew install graphviz
Unfortunately, we do not provide support for Windows systems, though you can refer to the pygraphviz documentation for information.
- Linux systems
pip install pygraphviz
- MacOS: please use conda install
conda install pygraphviz
Note that due to the default installation location by brew
, you need to provide some additional options for pip
installation.
XADD (eXtended Algebraic Decision Diagram) [Sanner at al., 2011] enables compact representation and operations with symbolic variables and functions. In fact, this data structure can be used to represent CPFs defined in a RDDL domain once it is grounded for a specific RDDL instance.
We use the xaddpy package that provides a Python implementation of XADD (originally implemented in Java). To install the package, simply run the following:
pip install xaddpy
In this article, we are going to walk you through how you can use xaddpy
to compile a CPF of a grounded fluent into an XADD node.
For example, let's look at the Wildfire instance of 3 x 3 locations.
Once the CPFs are grounded for this instance, we can see that the values of the non-fluents will simplify the CPFs. For instance, the neighboring cells of the (x1, y1)
location are (x1, y2)
, (x2, y1)
, and (x2, y2)
; hence, burning'(x1, y1)
should only depend on the states of these neighbors --- plus (x1, y1)
itself --- but none others.
Once you compile the CPFs of this instance into an XADD, you can actually see this structure easily. In other words, XADD compilation reveals the DBN dependency structures of different variables, which we also explain below.
To run the XADD compilation, we first need to import the domain and instance files. Then, we instantiate the RDDLModelXADD
class with the grounded CPFs given by the RDDLGrounder
object. The RDDLModelXADD
has the method called compile
which will compile the pyRDDLGym Expression
objects to XADD
nodes.
You can find an example run script from pyRDDLGym_symbolic/examples/run_xadd_compilation.py.
A nice way to interpret the resulting XADD may be to visualize it as a graph.
You can do this by calling the save_graph
method of the XADD
object.
# This is the object that has compiled XADD nodes
xadd_model = RDDLModelXADD(...)
# RDDLModelXADD.context is the XADD context object that
# handles all operations and stores all the nodes
xadd_model.context.save_graph(
xadd_model.cpfs["burning___x1__y1'"],
file_name="Wildfire/burning___x1__y1"
)
Here's the result:
If the figure is too small to comprehend, you can click the link above to check out the XADD graph. Notice that the leaf nodes contain either a Boolean value or a real value. This is the case when you pass reparam=False
to the RDDLModelXADD
class constructor. Otherwise, you'll see the Bernoulli variables in the CPFs reparameterized using uniform random variables. When we don't reparameterize, the leaf nodes show the Bernoulli probability values.
How will the graph look like for out-of-fuel'(x1, y1)
variable? Here's the result of context.save_graph(model_xadd.cpfs["out-of-fuel_x1_y1'"], file_name="out_of_fuel_x1_y1")
:
Very neat!
Although the Wildfire example nicely shows how XADD can be used to represent the CPFs of the domain, it only contains Boolean variables. In this part, we will show another example domain that has continuous fluents.
The domain we want to look at is the UAV mixed domain, whose definition is provided here.
If we follow the same procedure described above for the Wildfire domain with the domain name being replaced by 'UAV/Mixed'
, then we can compile the domain/instance in XADD. The overall DBN (dynamic Bayes net) structure of this instance is shown below.
Specifically, let's print out the CPF of vel'(?a1)
, which is
( [-1 + set_acc___a1 <= 0]
( [1 + set_acc___a1 <= 0]
( [-0.1 + vel___a1 <= 0]
( [0] )
( [-0.1 + vel___a1] )
)
( [set_acc___a1 + 10*vel___a1 <= 0]
( [0] )
( [0.1*set_acc___a1 + vel___a1] )
)
)
( [0.1 + vel___a1 <= 0]
( [0] )
( [0.1 + vel___a1] )
)
)
When visualized with pygraphviz
, we get the following:
In this case, you can see that the decision nodes have linear inequlity expressions instead of a Boolean decision. As for the function values at the leaf nodes, they are also linear expressions. xaddpy
package can also handle arbitrary nonlinear decisions and function values using SymEngine/SymPy under the hood.
Now, you can go ahead and use this functionality to analyze a given RDDL instance!
Next, we can now go ahead and draw DBN diagrams of various RDDL domain/instances. As a running example, we show how you can visualize a Wildfire instance as defined in pyRDDLGym_symbolic/examples/files/Wildfire.
If you want to run an example code and follow the steps for better understanding, please take a look at the run_dbn_visualization.py file.
Firstly, you can instantiate a RDDL2Graph
object by specifying the domain, instance, and some other parameters.
from pyRDDLGym_symbolic.core.visualizer import RDDL2Graph
domain = 'Wildfire'
domain_file = f'pyRDDLGym_symbolic/examples/files/{domain}/domain.rddl'
instance_file = f'pyRDDLGym_symbolic/examples/files/{domain}/instance0.rddl'
r2g = RDDL2Graph(
domain=domain,
domain_file=domain_file,
instance_file=instance_file,
directed=True,
strict_grouping=True,
)
Then, you can visualize the corresponding DBN by calling
r2g.save_dbn(file_name='Wildfire')
which will save a file named Wildfire_inst_0.pdf
to ./tmp/Wildfire
. Additionally, you can check the Wildfire_inst_0.txt
file which records grounded fluent names and their parents in the DBN.
The output of the function call looks like this (opens a new tab showing the PDF file, which we omit to include here since it would take up quite some space).
You can also specify a single fluent and/or a ground fluent that you are interested in for visualization. For example,
r2g.save_dbn(file_name='Wildfire', fluent='burning', gfluent='x1__y1', file_type='png')
will output the following graph:
huh, Nice! You can see from this diagram that the next state transition of the burning state at (x1, y1) only depends on 6 grounded variables (i.e., whether neighboring cells are burning; whether this location is out of fuel; whether the put-out action has been taken).
To give you a taste of another example, here's the DBN visualization of the Power Generation instance, in which intermediate variables are placed in the middle column:
With the run_vi.py file, you can run a value iteration solver.
Here, we provide a detailed dissection of the run script.
First, we compile a given RDDL domain/instance to XADD. This step follows the same procedure as in the examples above, so we skip it here.
mdp_parser = Parser()
mdp = mdp_parser.parse(
xadd_model,
xadd_model.discount,
concurrency=rddl_ast.instance.max_nondef_actions,
is_linear=args.is_linear,
include_noop=not args.skip_noop,
is_vi=True,
)
Then, in lines 46 - 54, we instantiate an MDPParser
object that has the parse
method, which interprets the XADD RDDL model and construct some necessary attributes, like CPFs and such.
Some important operations that happen within the parser are as follows:
- Bound analysis on continuous variables (lines 50-57 and lines 102-105):
# Configure the bounds of continuous states.
cont_s_vars = set()
for s in model.state_fluents:
if model.variable_ranges[s] != 'real':
continue
cont_s_vars.add(model.ns[s])
cont_state_bounds = self.configure_bounds(mdp, model.invariants, cont_s_vars)
mdp.cont_state_bounds = cont_state_bounds
...
...
# Configure the bounds of continuous actions.
if len(mdp.cont_a_vars) > 0:
cont_action_bounds = self.configure_bounds(mdp, model.preconditions, mdp.cont_a_vars)
mdp.cont_action_bounds = cont_action_bounds
Here, the parser has a method called configure_bounds
in which we perform the analysis on bounds of continuous variables. Specifically, the bound information has to be provided in state-invariants
and action-preconditions
blocks of the original RDDL domain file. If no bounds are provided for a variable, we assume [-inf, inf]
as its bounds.
Once configured, this bound information is then updated to the XADD
context object such that each continuous symbolic variable is associated with its upper and lower bounds.
- Handling of concurrent boolean actions (lines 75 - 90)
# Add concurrent actions for Boolean actions.
if is_vi:
# Need to consider all combinations of boolean actions.
# Note: there's always an implicit no-op action with which
# none of the boolean actions are set to True.
total_bool_actions = tuple(
_truncated_powerset(
bool_actions,
mdp.max_allowed_actions,
include_noop=include_noop,
))
for actions in total_bool_actions:
names = tuple(a.name for a in actions)
symbols = tuple(a.symbol for a in actions)
action = BActions(names, symbols, model)
mdp.add_action(action)
This part is where we handle concurrent actions, specifically for Value Iteration. Here we have a few modeling assumptions. First, continuous actions will always be concurrent, so we only specifically handle concurrent Boolean actions. Second, we provide an option to either use or not use a no-op
action, which sets all Boolean action values to False
.
Now, let's say we have 2 Boolean actions: move___a1
and pick___a1
. When the concurrency is set to 1
and we allow the noop
action, then we'll have the following Boolean actions:
noop
(i.e.,{move___a1: False, pick___a1: False}
){move___a1: True, pick___a1: False}
{move___a1: False, pick___a1: True}
On the other hand, if the concurrency is set to 2
, then we will have the following concurrent Boolean actions:
noop
(i.e.,{move___a1: False, pick___a1: False}
){move___a1: True, pick___a1: False}
{move___a1: False, pick___a1: True}
{move___a1: True, pick___a1: True}
That is, the concurrency value specifies the maximum number of Boolean actions that can be taken at each time step, so we should consider all possible combinations, which is done by the _truncated_powerset
helper function.
We define a class BActions
that can handle any of these concurrent actions. More importantly, the class implements a restrict
method in which we restrict a given XADD with the associated action values.
- Constructing the full CPFs for Boolean next state and interm variables (line 114)
By calling mdp.update(is_vi=is_vi)
, we update the CPFs of Boolean next state and interm variables to fully consider P(b'=0|...)
. This is a necessary step as in the RDDL file, we have only specified the probability of a Boolean variable being True
. Also, the update
method links updated CPFs with each action.
Finally, we call vi_solver.solve()
which will perform SDP to obtain the optimal symbolic value function.
Notice that the solve
method is shared by the ValueIteration
and PolicyEvaluation
solvers; hence, it's defined in base.py. The method will return the integer ID of the optimal value function at a set iteration number.
With the run_pe.py file, you can run a policy evaluation solver.
The script is exactly the same as run_vi.py until the XADD RDDL model compilation is done. Then, a slight difference of PE from VI is what we pass to the MDPParser.parse
function.
mdp = mdp_parser.parse(
xadd_model,
xadd_model.discount,
concurrency=rddl_ast.instance.max_nondef_actions,
is_linear=args.is_linear,
is_vi=False,
)
In PE, we do not have to specify the maximum concurrency value to the parser as that should be implicitly determined by the given policy. Instead, we set is_vi=False
such that we do not create BActions
objects.
Then, in lines 56 - 62, we instantiate a PolicyParser
object and parse the policy provided in a json format, specified by the argument --policy_fpath
. An example policy json file looks like the following (p1.json):
{
"action-fluents": ["a"],
"a": "pyRDDLGym_symbolic/examples/files/RobotLinear_1D/policy/a.xadd"
}
A policy json file should have the following field:
- "action-fluents": a list of grounded action variable names.
Then, it should be followed by "action-name": "file path" pairs for all actions specified in "action-fluents". This json file should specify the file path of each and every action fluent of a given problem; otherwise, an assertion error will occur from the parser.
The value of one action variable points to the file path where the XADD of that action is defined. The PolicyParser
will read in the XADD and perform some checks (e.g., type and dependency checks). Check out the comments in the policy_parser.py file for more detailed information.
The PolicyParser
class implements an assertion that in the entire state space no more than the set concurrency
number of Boolean actions can be set to True
. Check out the _assert_concurrency
method in lines 153 - 175 of policy_parser.py.
A unique step in PE is where we substitute in the policy XADDs into the CPFs and the reward function. See lines 20 - 58 of pe.py. Note how we handle the continuous and Boolean action variables differently.
Once all the CPFs and reward function are restricted with the given policy XADDs, the remaining steps are identical to VI, except that we do not have to iterate over actions as they have all been already incorporated into CPFs.
Optionally, we showcase how you can simulate an XADD-compiled RDDL instance in pyRDDLGym_symbolic/examples/run_simulation.py. Check out pyRDDLGym_symbolic/core/simulation.py to see how things are implemented.
Disclaimer: note that we won't actively support the simulation functionality as it has a limited use-case. Hence, there is a chance that an update in the main pyRDDLGym repository breaks the current simulation code we provide in this repository. This code is only provided as an example.
If you use the code provided in this repository, please use the following bibtex for citation:
@InProceedings{pmlr-v162-jeong22a,
title = {An Exact Symbolic Reduction of Linear Smart {P}redict+{O}ptimize to Mixed Integer Linear Programming},
author = {Jeong, Jihwan and Jaggi, Parth and Butler, Andrew and Sanner, Scott},
booktitle = {Proceedings of the 39th International Conference on Machine Learning},
pages = {10053--10067},
year = {2022},
volume = {162},
series = {Proceedings of Machine Learning Research},
month = {17--23 Jul},
publisher = {PMLR},
}