Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PyTorch CI HUD dashboard missing perf numbers: hf_Whisper #1935

Closed
wants to merge 4 commits into from

Conversation

xmfan
Copy link
Member

@xmfan xmfan commented Sep 25, 2023

A few models were passing accuracy check, but surprisingly failing the perf run, resulting in dashboard entries like:
image

Reproing the hud's commands locally,

# pass
python benchmarks/dynamo/torchbench.py --accuracy --no-translation-validation --training --amp --backend inductor --disable-cudagraphs --device cuda --total-partitions 4 --partition-id 1 --output hf_Whisper_accuracy.csv --only hf_Whisper

# fail (on https://github.com/pytorch/benchmark/blob/4ea3bba3b8010f5d4a629bb8f530a92570f34518/torchbenchmark/util/model.py#L195C48-L195C48)
python benchmarks/dynamo/torchbench.py --performance --cold-start-latency --training --amp --backend inductor --disable-cudagraphs --device cuda --total-partitions 4 --partition-id 1 --output hf_Whisper_perf.csv --only hf_Whisper

The error suggests that hf_Whisper does not provide a batch size for the training mode perf run.

Summarizing discussion with @xuzhao9:

I think we could:

  1. set a default train batch size for hf_Whisper, if you still want to test forward/backward pass without a defined train test
  2. in model.py, make sure self.batch_size is not None (before accuracy check overrides batch size to 4)

I implement 1, we set default batch sizes in the parent class of all benchmark models, with ability to be overwritten by individual models.

@xmfan xmfan requested a review from xuzhao9 September 25, 2023 21:15
@xmfan xmfan marked this pull request as ready for review September 25, 2023 21:15
torchbenchmark/util/model.py Outdated Show resolved Hide resolved
@xmfan xmfan changed the title Set default batch sizes for train/eval benchmark models Fix PyTorch CI HUD dashboard missing perf numbers Sep 26, 2023
Copy link
Contributor

@xuzhao9 xuzhao9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is much clearer now. Thanks for making this improvement!

@facebook-github-bot
Copy link
Contributor

@xmfan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@xmfan xmfan changed the title Fix PyTorch CI HUD dashboard missing perf numbers Fix PyTorch CI HUD dashboard missing perf numbers: hf_Whisper Sep 26, 2023
@msaroufim
Copy link
Member

A lot of models are missing, I'm curious how many more models are affected by this issue cc @bdhirsh

@xmfan xmfan temporarily deployed to docker-s3-upload September 26, 2023 20:09 — with GitHub Actions Inactive
@xmfan xmfan temporarily deployed to docker-s3-upload September 26, 2023 20:09 — with GitHub Actions Inactive
@facebook-github-bot
Copy link
Contributor

@xmfan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xmfan merged this pull request in 3f1c3eb.

elif self.test == "eval" and (not self.batch_size == self.DEFAULT_EVAL_BSIZE):
raise NotImplementedError("Model doesn't support customizing batch size.")
raise NotImplementedError(f"Model doesn't support customizing batch size, but {self.test} test is providing a batch size other than DEFAULT_EVAL_BSIZE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why a model would have ALLOW_CUSTOMIZE_BSIZE but we would end up in this branch. For context, I'm looking into why we are not running stable_diffusion_unet in inference

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we just use the batch size of the model instead of failing ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ALLOW_CUSTOMIZE_BSIZE = False, the model will only accept the default batch size, not the batch size specified by the user.

We could silently use the default batch size instead of failing, but my concern this will cause misunderstanding on the user side (for example, they might think the model is running in batch size 100, but ALLOW_CUSTOMIZE_BSIZE = False and the default batch size is 1, so it will run silently with batch size 1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If batch_size is passed in as None, it seems okay to use the default specified on the model, instead of specified in self.metadata["devices"][current_device_name][device_batch_size_key]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we're worried about that case, we should also fix this upstream handling of it: https://github.com/pytorch/pytorch/blob/main/benchmarks/dynamo/torchbench.py#L369-L374

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If batch_size is passed in as None, it seems okay to use the default specified on the model, instead of specified in self.metadata["devices"][current_device_name][device_batch_size_key]

Right, this is a bug. If ALLOW_CUSTOMIZE_BSIZE = False and batch_size passed in as None, we should use the default specified on the model instead of the device-specified batch size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants