Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Signed-off-by: Sven Mika <sven@anyscale.io>
  • Loading branch information
sven1977 and angelinalg authored Jan 4, 2025
1 parent 3c326f1 commit 3efaa47
Showing 1 changed file with 26 additions and 26 deletions.
52 changes: 26 additions & 26 deletions doc/source/rllib/scaling-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,32 +4,33 @@

.. _rllib-scaling-guide-docs:

RLlib Scaling Guide
RLlib scaling guide
===================

RLlib is a distributed and scalable RL library, based on `Ray <https://www.ray.io/>`__. An RLlib :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`
makes use of `Ray actors <https://docs.ray.io/en/latest/ray-core/actors.html>`__ where ever parallelization of
its sub-components can speed up sample- and learning throughput.
uses `Ray actors <https://docs.ray.io/en/latest/ray-core/actors.html>`__ wherever parallelization of
its sub-components can speed up sample and learning throughput.

.. figure:: images/scaling_axes_overview.svg
:width: 600
:align: left

**Scalable axes in RLlib**: Three scaling axes are currently available across all RLlib :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` classes:
The number of :py:class:`~ray.rllib.env.env_runner.EnvRunner` actors in the :py:class:`~ray.rllib.env.env_runner_group.EnvRunnerGroup`,
settable through ``config.env_runners(num_env_runners=n)``, the number of vectorized sub-environments on each
:py:class:`~ray.rllib.env.env_runner.EnvRunner` actor, settable through ``config.env_runners(num_envs_per_env_runner=p)``, and
the number of :py:class:`~ray.rllib.core.learner.learner.Learner` actors in the
:py:class:`~ray.rllib.core.learner.learner_group.LearnerGroup`, settable through ``config.learners(num_learners=m)``.
**Scalable axes in RLlib**: Three scaling axes are available across all RLlib :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` classes:
- the number of :py:class:`~ray.rllib.env.env_runner.EnvRunner` actors in the :py:class:`~ray.rllib.env.env_runner_group.EnvRunnerGroup`,
settable through ``config.env_runners(num_env_runners=n)``
- the number of vectorized sub-environments on each
:py:class:`~ray.rllib.env.env_runner.EnvRunner` actor, settable through ``config.env_runners(num_envs_per_env_runner=p)``
- the number of :py:class:`~ray.rllib.core.learner.learner.Learner` actors in the
:py:class:`~ray.rllib.core.learner.learner_group.LearnerGroup`, settable through ``config.learners(num_learners=m)``


Scaling the number of EnvRunner actors
--------------------------------------

You can control the degree of parallelism for the sampling machinery of your
You can control the degree of parallelism for the sampling machinery of the
:py:class:`~ray.rllib.algorithms.algorithm.Algorithm` by increasing the number of remote
:py:class:`~ray.rllib.env.env_runner.EnvRunner` actors in the :py:class:`~ray.rllib.env.env_runner_group.EnvRunnerGroup`
through your config as follows.
through the config as follows.

.. testcode::

Expand All @@ -50,22 +51,21 @@ To assign resources to each :py:class:`~ray.rllib.env.env_runner.EnvRunner`, use
num_gpus_per_env_runner=..,
)
See here for an
See this
`example of an EnvRunner and RL environment requiring a GPU resource <https://github.com/ray-project/ray/blob/master/rllib/examples/gpus/gpus_on_env_runners.py>`__.

The number of GPUs may be fractional quantities, for example 0.5, to allocate only a fraction of a GPU per
:py:class:`~ray.rllib.env.env_runner.EnvRunner`.

Note that there is always one "local" :py:class:`~ray.rllib.env.env_runner.EnvRunner` in the
Note that there's always one "local" :py:class:`~ray.rllib.env.env_runner.EnvRunner` in the
:py:class:`~ray.rllib.env.env_runner_group.EnvRunnerGroup`.
If you only want to sample using this local :py:class:`~ray.rllib.env.env_runner.EnvRunner`,
you should set ``num_env_runners=0``. This local :py:class:`~ray.rllib.env.env_runner.EnvRunner` directly sits in the main
set ``num_env_runners=0``. This local :py:class:`~ray.rllib.env.env_runner.EnvRunner` directly sits in the main
:py:class:`~ray.rllib.algorithms.algorithm.Algorithm` process.

.. hint::
The Ray team may decide to deprecate the local :py:class:`~ray.rllib.env.env_runner.EnvRunner` some time in the future.
There are historic reasons for why it still exists, however, its arguable, whether it's still of sufficient
usefulness to keep it in the set.
It still exists for historical reasons. It's usefulness to keep in the set is still under debate.


Scaling the number of envs per EnvRunner actor
Expand Down Expand Up @@ -98,7 +98,7 @@ By default, the individual sub-environments in a vector ``step`` and ``reset``,
the action computation of the RL environment loop parallel, because observations can move through the model
in a batch.
However, `gymnasium <https://gymnasium.farama.org/>`__ supports an asynchronous
vectorization setting, in which each sub-environment receives its own python process.
vectorization setting, in which each sub-environment receives its own Python process.
This way, the vector environment can ``step`` or ``reset`` in parallel. Activate
this asynchronous vectorization behavior through:

Expand All @@ -113,7 +113,7 @@ this asynchronous vectorization behavior through:
This setting can speed up the sampling process significantly in combination with ``num_envs_per_env_runner > 1``,
especially when your RL environment's stepping process is time consuming.

See here for an `example script demonstrating a massive speedup with async vectorization <https://github.com/ray-project/ray/blob/master/rllib/examples/envs/async_gym_env_vectorization.py>`__.
See this `example script <https://github.com/ray-project/ray/blob/master/rllib/examples/envs/async_gym_env_vectorization.py>`__ that demonstrates a massive speedup with async vectorization.


Scaling the number of Learner actors
Expand All @@ -136,34 +136,34 @@ Set the number of remote :py:class:`~ray.rllib.core.learner.learner.Learner` act
.learners(num_learners=2)
)

Normally, you use as many :py:class:`~ray.rllib.core.learner.learner.Learner` actors as you have GPUs available for training.
Make sure, you set the number of GPUs per :py:class:`~ray.rllib.core.learner.learner.Learner` to 1:
Typically, you use as many :py:class:`~ray.rllib.core.learner.learner.Learner` actors as you have GPUs available for training.
Make sure to set the number of GPUs per :py:class:`~ray.rllib.core.learner.learner.Learner` to 1:

.. testcode::

config.learners(num_gpus_per_learner=1)

.. warning::
For some algorithms, such as IMPALA and APPO, the performance of a single remote
:py:class:`~ray.rllib.core.learner.learner.Learner` actor (``num_learners=1``) vs a
:py:class:`~ray.rllib.core.learner.learner.Learner` actor (``num_learners=1``) compared to a
single local :py:class:`~ray.rllib.core.learner.learner.Learner` instance (``num_learners=0``),
depends on whether you have a GPU available or not.
If exactly one GPU is available, you should run these two algorithms with ``num_learners=0, num_gpus_per_learner=1``,
if no GPU is available, you should set ``num_learners=1, num_gpus_per_learner=0``. For > 1 GPUs available, you should
if no GPU is available, set ``num_learners=1, num_gpus_per_learner=0``. If more than 1 GPU is available,
set ``num_learners=.., num_gpus_per_learner=1``.

The number of GPUs may be fractional quantities, for example 0.5, to allocate only a fraction of a GPU per
:py:class:`~ray.rllib.env.env_runner.EnvRunner`. For example, you can pack five :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`
instances onto one GPU by setting ``num_learners=1, num_gpus_per_learner=0.2``.
See `this fractional GPU example here <https://github.com/ray-project/ray/blob/master/rllib/examples/gpus/fractional_gpus.py>`__
See this `fractional GPU example <https://github.com/ray-project/ray/blob/master/rllib/examples/gpus/fractional_gpus.py>`__
for details.

.. note::
If you specify ``num_gpus_per_learner > 0`` and your machine doesn't have the required number of GPUs
available, the experiment may stall until the Ray autoscaler brings up enough machines to fulfill the resource request.
If your cluster has autoscaling turned off, this then results in a seemingly hanging experiment run.
If your cluster has autoscaling turned off, this setting then results in a seemingly hanging experiment run.

On the other hand, if you set ``num_gpus_per_learner=0``, RLlib builds your :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`
On the other hand, if you set ``num_gpus_per_learner=0``, RLlib builds the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`
instances solely on CPUs, even if GPUs are available on the cluster.


Expand All @@ -172,7 +172,7 @@ Outlook: More RLlib elements that should scale

There are other components and aspects in RLlib that should be able to scale up.

For example, the model size is currently limited to what ever fits on a single GPU, due to
For example, the model size is limited to whatever fits on a single GPU, due to
"distributed data parallel" (DDP) being the only way in which RLlib scales :py:class:`~ray.rllib.core.learner.learner.Learner`
actors.

Expand Down

0 comments on commit 3efaa47

Please sign in to comment.