DeepMind's library for building modular robotic manipulation environments, both in simulation and on real robots.
An quick-start introductory tutorial can be found at this Colab:
MoMa builds on DeepMind's Composer library (part of dm_control
).
Composer helps build simulation environments for reinforcement-learning,
providing tools to define actions, observations, and rewards based on MuJoCo
entities.
MoMa wraps Composer to make it easy to build manipulation environments, and the abstractions MoMa introduces allow these environments to work in both simulation and the real world.
MoMa is designed to be modular with respect to the robots in an environment, whether running in simulation or reality, and the task-specific game logic for a single RL environment.
MoMa does this by separating an RL environment into 2 components, the physical setup and the task logic.
MoMa enforces that the only way to interact with an RL environment is via a set of sensors and effectors, which define the input-output interface of the environment.
Sensors provide an abstraction for real hardware sensors, but they can be
used in simulation as well. They read in information from the simulated or
real world and produce the observations in an RL environment. The sensors
package provides several ready-to-use sensors. You will see examples of sensors
that are used to collect robot joint positions, object positions, gripper
state, etc.
Effectors consume the actions in an RL environment and actuate robots, again
either in simulation or the real world. The effectors
package provides
several commonly-used effectors.
At MoMa's core is BaseTask
, a variant of composer.Task
which contains a
set of sensors and effectors. With this abstraction, BaseTask
can encapsulate
a manipuation environment for any robot arm(s) and gripper(s), in either
simulation or in reality.
BaseTask
represents a "physical" environment (e.g. a single Sawyer
arm and Robotiq gripper with 2 cameras, running either in simulation or
reality), but that alone doesn't define a complete RL environment. For an RL
environment, we need to define the agent's actions, the observations, and the
rewards.
We use 2 abstractions from DeepMind's AgentFlow to help define things.
-
agentflow.ActionSpace
maps the agent's actions to a new space or to relevant effectors in theBaseTask
. -
agentflow.TimestepPreprocessor
modifies the base RL timestep before returning it to the agent. They can be used to modify observations, add rewards, etc. They can also be chained together. The name "timestep preprocessor" comes from the fact that the timestep is preprocessed before being passed on to the agent. Theagentflow.preprocessors
package contains many useful, ready-to-use timestep preprocessors.
Together, the ActionSpace
and TimestepPreprocessor
define the "game logic"
for an RL environment, and they are housed inside an agentflow.SubTask
.
If you have a fixed physical setup and you just want to change the task, all
you need to change is the af.SubTask
. Likewise, if you have a single task but
want to switch the hardware or switch between sim and real, you can fix the
af.SubTask
and just change the BaseTask
. See the AgentFlow documentation
for more information.
In cases where there is only one objective for the RL agent (i.e. one instance
of the game-logic), you can use MoMa's SubtaskEnvironment, which exposes a
single agentflow.SubTask
with Deepmind's standard RL environment interface,
dm_env.Environment.
Here is a diagram presenting the different components of a MoMa subtask environment along with an explanation of information flow and different links to the code.
-
The agent sends an action to a MoMa
SubTaskEnvironment
which serves as a container for the different components used in a task. The action is passed to an AgentFlowActionSpace
that projects the agent's action to a new action space that matches the spec of the underlying effector(s). -
The projected action is given to effectors. This allows us to use both sim or real robots for the same task.
-
The effectors then actuate the robots either in sim or in real.
-
The sensors then collect information from the robotics environment. Sensors are an abstraction layer for both sim and real, similar to Effectors.
-
The
BaseTask
then passes the timestep to an AgentFlowTimestepPreprocessor
. The preprocessor can change the timestep's observations and rewards, and it can terminate an RL episode if some termination criteria are met. -
The modified timestep is then passed on to the agent.
Given a single BaseTask
which represents a collection of robots and sensors,
we can support multiple RL tasks and "flow" between them. Each RL task is an
agentflow.SubTask
, containing its own "game logic" specifying the agent's
action space, observations, rewards, and episode termination criteria.
AgentFlow contains utilities to specify these different subtasks and define how the agent can move from subtask to subtask. Please see the AgentFlow docs for more information.
To build a new MoMa environment, you can use the subtask_env_builder pattern. An example of this pattern can be found in our example task and in the tutorial linked at the top.