-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix PyTorch CI HUD dashboard missing perf numbers: hf_Whisper #1935
Conversation
8a89e44
to
12a31f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is much clearer now. Thanks for making this improvement!
@xmfan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
A lot of models are missing, I'm curious how many more models are affected by this issue cc @bdhirsh |
@xmfan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
elif self.test == "eval" and (not self.batch_size == self.DEFAULT_EVAL_BSIZE): | ||
raise NotImplementedError("Model doesn't support customizing batch size.") | ||
raise NotImplementedError(f"Model doesn't support customizing batch size, but {self.test} test is providing a batch size other than DEFAULT_EVAL_BSIZE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why a model would have ALLOW_CUSTOMIZE_BSIZE
but we would end up in this branch. For context, I'm looking into why we are not running stable_diffusion_unet in inference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we just use the batch size of the model instead of failing ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ALLOW_CUSTOMIZE_BSIZE = False
, the model will only accept the default batch size, not the batch size specified by the user.
We could silently use the default batch size instead of failing, but my concern this will cause misunderstanding on the user side (for example, they might think the model is running in batch size 100, but ALLOW_CUSTOMIZE_BSIZE = False
and the default batch size is 1, so it will run silently with batch size 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If batch_size is passed in as None, it seems okay to use the default specified on the model, instead of specified in self.metadata["devices"][current_device_name][device_batch_size_key]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we're worried about that case, we should also fix this upstream handling of it: https://github.com/pytorch/pytorch/blob/main/benchmarks/dynamo/torchbench.py#L369-L374
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If batch_size is passed in as None, it seems okay to use the default specified on the model, instead of specified in
self.metadata["devices"][current_device_name][device_batch_size_key]
Right, this is a bug. If ALLOW_CUSTOMIZE_BSIZE = False
and batch_size
passed in as None, we should use the default specified on the model instead of the device-specified batch size.
A few models were passing accuracy check, but surprisingly failing the perf run, resulting in dashboard entries like:
Reproing the hud's commands locally,
The error suggests that hf_Whisper does not provide a batch size for the training mode perf run.
Summarizing discussion with @xuzhao9:
I implement 1, we set default batch sizes in the parent class of all benchmark models, with ability to be overwritten by individual models.