Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HF Auth mixin to Stable Diffusion #1763

Closed
wants to merge 22 commits into from
Closed

Conversation

msaroufim
Copy link
Member

@msaroufim msaroufim commented Jul 13, 2023

Right now stale diffusion and lit-llama are not actually running in CI because they get rate limited by huggingface. since we've now added an auth token as a github secret we can move stable diffusion out of canary and do things like include it in blueberries dashboard

We also added some nice errors so people running in torchbench locally know they will need to have a token to run these models

Anyways auth is a mixin which seems like the right abstraction

Some relevant details about the model

Torchbench has a function get_module() that has the intent of testing a nn.Module on an actual torch.Tensor

Unfortunately a StableDiffusionPipeline is not an nn.Module it's a composition of a tokenizer and 3 seperate nn.Modules an encoder, vae and unet.

text_encoder

    def get_module(self):
        batch_size = 1
        sequence_length = 10
        vocab_size = 32000

        # Generate random indices within the valid range
        input_tensor = torch.randint(low=0, high=vocab_size, size=(batch_size, sequence_length))

        # Make sure the tensor has the correct data type
        input_tensor = input_tensor.long()
        print(self.pipe.text_encoder(input_tensor))
        return self.pipe.text_encoder, input_tensor

Text encoder outputs a BaseModelOutputWithPooling which has multiple nn modules https://gist.github.com/msaroufim/51f0038863c5cce4cc3045e4d9f9c399

======================================================================
FAIL: test_stable_diffusion_example_cuda (__main__.TestBenchmark)
----------------------------------------------------------------------
components._impl.workers.subprocess_rpc.ChildTraceException: Traceback (most recent call last):
  File "/home/ubuntu/benchmark/components/_impl/workers/subprocess_rpc.py", line 482, in _run_block
    exec(  # noqa: P204
  File "<subprocess-worker>", line 35, in <module>
  File "<subprocess-worker>", line 12, in _run_in_worker_f
  File "/home/ubuntu/benchmark/torchbenchmark/util/model.py", line 26, in __call__
    obj.__post__init__()
  File "/home/ubuntu/benchmark/torchbenchmark/util/model.py", line 126, in __post__init__
    self.accuracy = check_accuracy(self)
  File "/home/ubuntu/benchmark/torchbenchmark/util/env_check.py", line 469, in check_accuracy
    model, example_inputs = maybe_cast(tbmodel, model, example_inputs)
  File "/home/ubuntu/benchmark/torchbenchmark/util/env_check.py", line 424, in maybe_cast
    example_inputs = clone_inputs(example_inputs)
  File "/home/ubuntu/benchmark/torchbenchmark/util/env_check.py", line 297, in clone_inputs
    assert isinstance(value, torch.Tensor)
AssertionError

vae

    def get_module(self):
        print(self.pipe.vae(torch.randn(9,3,9,9)))

Same problem for vae
https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/vae.py#L27

unet

    def get_module(self):
        # This will only benchmark the unet since that's the biggest layer
        # Stable diffusion is a composition of a text encoder, unet and vae
        encoder_hidden_states = torch.randn(320, 1024)
        sample = torch.randn(4, 4, 4, 32)
        timestep = 5
        inputs_to_pipe = {'timestep': timestep, 'encoder_hidden_states': encoder_hidden_states, 'sample': sample}
        result = self.pipe.unet(**inputs_to_pipe)
        return self.pipe, inputs_to_pipe

Unet unfortunately does not have a tensor input

For VAE and encoder the test failure is particularly helpful

(sam) ubuntu@ip-172-31-9-217:~/benchmark$ python test.py -k "test_stable_diffusion_example_cuda"
F
======================================================================
FAIL: test_stable_diffusion_example_cuda (__main__.TestBenchmark)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ubuntu/benchmark/test.py", line 75, in example_fn
    assert accuracy == "pass" or accuracy == "eager_1st_run_OOM", f"Expected accuracy pass, get {accuracy}"
AssertionError: Expected accuracy pass, get eager_1st_run_fail

----------------------------------------------------------------------
Ran 1 test in 7.402s

FAILED (failures=1)

@msaroufim msaroufim requested a review from xuzhao9 July 13, 2023 01:42
@msaroufim msaroufim requested a review from xuzhao9 July 13, 2023 17:45
@msaroufim msaroufim changed the title Add Auth to Stable Diffusion Add HF Auth to Stable Diffusion Jul 13, 2023
@msaroufim msaroufim changed the title Add HF Auth to Stable Diffusion Add HF Auth mixin to Stable Diffusion Jul 13, 2023
@xuzhao9
Copy link
Contributor

xuzhao9 commented Jul 14, 2023

Looks like it is also needed to setup the env value in the Docker run command so that the Docker container can access to it: https://github.com/pytorch/benchmark/blob/main/.github/workflows/pr-a10g.yml#L38

Use docker run -e HUGGINGFACE_HUB_TOKEN=${HUGGINGFACE_HUB_TOKEN} ... to setup the env value in the docker container.

@msaroufim
Copy link
Member Author

msaroufim commented Jul 15, 2023

The last part that's tripping me up is how to make get_module() work for this PR - @xuzhao9 i posted some logs and explanation above lemme know if you have any thoughts

@msaroufim msaroufim requested a review from xuzhao9 July 17, 2023 16:08
@msaroufim
Copy link
Member Author

msaroufim commented Jul 18, 2023

Thanks @xuzhao9 for offline help, the example error was fixed locally

I had to run python run.py "stable_diffusion" -d cuda --accuracy to see the real error

Copy link
Contributor

@xuzhao9 xuzhao9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see the CI is green on this PR. Great job!

@facebook-github-bot
Copy link
Contributor

@msaroufim has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@msaroufim merged this pull request in 411e388.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants