Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try theta_2 cuda upload to use Triple sum #422

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

whdlgp
Copy link

@whdlgp whdlgp commented Dec 13, 2024

The following errors may occur when using the "Triple sum" function.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

The problem is,

  • theta_0 and theta_1 were uploaded to CUDA with the code below
  • but theta_2 was not uploaded.
    code in "mergers.py", 418 line
        theta_0[key] = theta_0[key].to("cuda")
        theta_1[key] = theta_1[key].to("cuda")

To solve this, I add simple check code for theta_2

        theta_0[key] = theta_0[key].to("cuda")
        theta_1[key] = theta_1[key].to("cuda")
        try:
            theta_2[key] = theta_2[key].to("cuda")
        except NameError:
            None

This method increases GPU memory usage by putting theta_2 on the GPU, but it succeeds if the GPU memory capacity is sufficient.

In my testing, I can do Triplesum on an RTX3080Ti with 12GB of VRAM. But the memory usage is pretty close.

Please review my commit and I hope it will help you.

Added: Fix exception handling

Updated the exception handling from except NameError to the more general except Exception to handle 2 model merge

        except NameError:
            None

to

        except Exception as e:
            pass # Do nothing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant