Memory grows due to keeping losses on device #763

carmocca · 2024-12-27T14:07:39Z

If logging is disabled (or very infrequent), the memory usage slowly grows because the max and average loss is kept in a list on-device: https://github.com/pytorch/torchtitan/blob/main/train.py#L353-L354

The training loop should offload these tensors to the CPU right after their aggregation is finished. Especially because the logging prints will do that anyways under the hood

tianyu-l · 2024-12-27T22:31:08Z

Thanks for raising this issue, @carmocca !

It looks to me that what's kept on device is https://github.com/pytorch/torchtitan/blob/main/train.py#L330 (the max and average you mentioned are on CPU?).

I think the reason we keep it is because we don't want call .item() (which incurs synchronization between CPU and GPU) unless hitting a log step. I do agree that if logging is disabled / infrequent, this overhead is unnecessary. Although, may I ask what's the use case where you'd log too infrequently so that this overhead becomes unacceptable?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory grows due to keeping losses on device #763

Memory grows due to keeping losses on device #763

carmocca commented Dec 27, 2024

tianyu-l commented Dec 27, 2024

Memory grows due to keeping losses on device #763

Memory grows due to keeping losses on device #763

Comments

carmocca commented Dec 27, 2024

tianyu-l commented Dec 27, 2024