How to configure R2D2 to improve wall-time #560

MarcoMeter · 2022-12-20T07:59:51Z

MarcoMeter
Dec 20, 2022

Hello everyone!

How fast is ding's R2D2 implementation in comparison to the original paper results?
My goal is to efficiently exploit 32 cores and one A100 GPU.

I usually run experiments using my recurrent PPO baseline utilizing 32 actors. After roughly 12 hours, I achieve a throughput of 150 Mio steps. I naively run ding's R2D2 using 24 actors and achieved only 5 Mio steps after 12 hours on the same custom environment that is used for the PPO experiments. A random agent achieves 10k steps per second on this environment.

The original paper does not provide all details, but they state that a single GPU learner achieves a throughput of 25600 steps per second using 256 actors. One actor on Atari is described to be capable of collecting 260 samples per second on Atari. The utilized computational resources are not mentioned.

Do you have any suggestions on how I could change the config (see below) to significantly accelerate the sample throughput during training?
Does the batch size refer to the number of sampled sequences for optimization? Or does it refer to the number of experience tuples (i.e. steps)?

edit: I obviously forgot to set cuda to True. However, I'm running now into exceptions (#561).

from easydict import EasyDict

collector_env_num = 24
evaluator_env_num = 16
mmgrid_r2d2_config = dict(
    exp_name='mmgrid_r2d2_seed0',
    env=dict(
        collector_env_num=collector_env_num,
        evaluator_env_num=evaluator_env_num,
        n_evaluator_episode=evaluator_env_num,
        stop_value=0.9,
    ),
    policy=dict(
        cuda=False,
        priority=False,
        priority_IS_weight=False,
        model=dict(
            obs_shape=[3, 84, 84],
            action_shape=4,
            encoder_hidden_size_list=[128, 128, 256]
            # https://github.com/opendilab/DI-engine/blob/main/dizoo/mario/mario_dqn_config.py
        ),
        discount_factor=0.995,
        nstep=5,
        burnin_step=2,
        # (int) the whole sequence length to unroll the RNN network minus
        # the timesteps of burnin part,
        # i.e., <the whole sequence length> = <unroll_len> = <burnin_step> + <learn_unroll_len>
        learn_unroll_len=40,
        learn=dict(
            # according to the R2D2 paper, actor parameter update interval is 400
            # environment timesteps, and in per collect phase, we collect 32 sequence
            # samples, the length of each sample sequence is <burnin_step> + <unroll_len>,
            # which is 100 in our seeting, 32*100/400=8, so we set update_per_collect=8
            # in most environments
            update_per_collect=5,
            batch_size=64,
            learning_rate=0.0005,
            target_update_theta=0.001,
        ),
        collect=dict(
            # NOTE: It is important that set key traj_len_inf=True here,
            # to make sure self._traj_len=INF in serial_sample_collector.py.
            # In R2D2 policy, for each collect_env, we want to collect data of length self._traj_len=INF
            # unless the episode enters the 'done' state.
            # In each collect phase, we collect a total of <n_sample> sequence samples.
            n_sample=128,
            unroll_len=2 + 40,
            traj_len_inf=True,
            env_num=collector_env_num,
        ),
        eval=dict(env_num=evaluator_env_num, evaluator=dict(eval_freq=30)),
        other=dict(
            eps=dict(
                type='exp',
                start=0.95,
                end=0.05,
                decay=2500000, # https://github.com/opendilab/DI-engine/blob/main/dizoo/mario/mario_dqn_config.py
            ), replay_buffer=dict(replay_buffer_size=1000000, )
        ),
    ),
)
mmgrid_r2d2_config = EasyDict(mmgrid_r2d2_config)
main_config = mmgrid_r2d2_config
mmgrid_r2d2_create_config = dict(
    env_manager=dict(type='base'),
    policy=dict(type='r2d2'),
)
mmgrid_r2d2_create_config = EasyDict(mmgrid_r2d2_create_config)
create_config = mmgrid_r2d2_create_config

PaParaZz1 · 2022-12-29T11:29:16Z

PaParaZz1
Dec 29, 2022
Maintainer

Do you still work on this training throughput problem? If so, we will add a distributed R2D2 demo to help you.

1 reply

MarcoMeter Dec 29, 2022
Author

My current state is that I achieve roughly 15 million steps after 12 hours while using cuda. Can you recommend a set of hyperparameters (e.g. collector_env_num , update_per_collect, batch_size, n_sample, ...) that achieve a higher throughput on just one node using 32 cores and one A100?

I don't have enough resources to run R2D2 on multiple nodes. Right now my goal is to find baselines to compare my work to.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to configure R2D2 to improve wall-time #560

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How to configure R2D2 to improve wall-time #560

MarcoMeter Dec 20, 2022

Replies: 1 comment · 1 reply

PaParaZz1 Dec 29, 2022 Maintainer

MarcoMeter Dec 29, 2022 Author

MarcoMeter
Dec 20, 2022

Replies: 1 comment 1 reply

PaParaZz1
Dec 29, 2022
Maintainer

MarcoMeter Dec 29, 2022
Author