How to configure R2D2 to improve wall-time #560
Unanswered
MarcoMeter
asked this question in
Q&A
Replies: 1 comment 1 reply
-
Do you still work on this training throughput problem? If so, we will add a distributed R2D2 demo to help you. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone!
How fast is ding's R2D2 implementation in comparison to the original paper results?
My goal is to efficiently exploit 32 cores and one A100 GPU.
I usually run experiments using my recurrent PPO baseline utilizing 32 actors. After roughly 12 hours, I achieve a throughput of 150 Mio steps. I naively run ding's R2D2 using 24 actors and achieved only 5 Mio steps after 12 hours on the same custom environment that is used for the PPO experiments. A random agent achieves 10k steps per second on this environment.
The original paper does not provide all details, but they state that a single GPU learner achieves a throughput of 25600 steps per second using 256 actors. One actor on Atari is described to be capable of collecting 260 samples per second on Atari. The utilized computational resources are not mentioned.
Do you have any suggestions on how I could change the config (see below) to significantly accelerate the sample throughput during training?
Does the batch size refer to the number of sampled sequences for optimization? Or does it refer to the number of experience tuples (i.e. steps)?
edit: I obviously forgot to set cuda to True. However, I'm running now into exceptions (#561).
Beta Was this translation helpful? Give feedback.
All reactions