-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor llm perf backend handling #258
Conversation
I also added the new label system to the leaderboard, you can trigger it via the |
if: ${{ | ||
(github.event_name == 'push') || | ||
(github.event_name == 'workflow_dispatch') || | ||
contains( github.event.pull_request.labels.*.name, 'leaderboard')}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can probably add more specifications here to be able to run specific benchmarks, like cuda
/cpu
didn't try it, but you might also be able to add conditions on matrix arguments, like || contains( github.event.pull_request.labels.*.name, matrix.subset)}}
to run specific subsets or specific machines
|
||
class CUDAPyTorchBenchmarkRunner(BenchmarkRunner): | ||
def __init__(self): | ||
super().__init__(backend="pytorch", hardware="cuda") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer device="cuda"
to not confuse terminologies, optimum-benchmark already has a terminology and it's better to not make it ambiguous.
llm_perf/common/benchmark_runner.py
Outdated
from optimum_benchmark.logging_utils import setup_logging | ||
|
||
|
||
class BenchmarkRunner(ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a fan of using the keyword runner here, maybe an LLMPerfBenchmarkManager, no strong opinion on this tho
llm_perf/common/benchmark_runner.py
Outdated
self.attention_configs = self._get_attention_configs() | ||
self.weights_configs = self._get_weights_configs(self.subset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very specific for pytorch backend (only backend with attention imp control), I don't see it holding once more backends are introduced
llm_perf/common/benchmark_runner.py
Outdated
def run_benchmark(self, model: str, attn_implementation: str, weights_config: str): | ||
benchmark_name = f"{weights_config}-{attn_implementation}" | ||
subfolder = f"{benchmark_name}/{model.replace('/', '--')}" | ||
|
||
if not self.is_benchmark_supported(weights_config, attn_implementation): | ||
self.logger.info(f"Skipping benchmark {benchmark_name} with model {model} since it is not supported") | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't hold for other backends as their configuration will be in term of other optimizations to use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the hardware class which registers info about the hardware configs we deal with.
The BenchmarkRunner
class seems to me that it'll make things convoluted a bit, with less readability. The benefit of the one file script design is that it showcases how simple and linear it is to conduct benchmarks (like a model fine-tuning recipe), similar to transformers examples. One might argue that we could group examples in a class called ExamplesRunner
but for a user who wanna reproduce results by reading and running a simple script, it complicates things.
But if this is truly what will make it simpler for you to add more benchmarks to the leaderboard, I have no issue with it.
@regisss
The main idea of abstracting some of the logic for the llm-perf leaderboard code is to prevent code duplication between all the different benchmarking codes. This should speed up development and also facilitate maintainability. I agree it would be nice to run a simple script to get the results from the leaderboard. It is still possible to run the benchmarks after cloning the repo and doing |
yeh i see your point, it's just that I moved the llm-perf into a folder in optimum-benchmark based on the idea that it's just a couple of scripts that don't require a lot of maintenance and work as an example of usage of optimum-benchmark, but if the plan is to make it a more complex project then maybe it's time for it to have its own repo again, to ease its development for you. wdyt ? |
there's also the fact that its compute is scaling and I think it'd better if it's contained so that it wouldn't slow down or throttle the development CI of optimum-benchmark. |
Good question... (I like working in this repo as i learn from your code and reviews quite a bit 😄 but i don't mind changing) Also @regisss do you have an opinion on the matter? |
it was for the reason above, and for faster development as optimum-benchmark was changing at a higher rate (before pypi release and transformers adoption).
It's mostly using the cli/api interface which we're keeping consistent across releases, so I don't see any alignment issues.
I think a separation of package and application makes more sense, like |
OK then I will do a separate backend for the llm-perf! I will make another PR this week, to remove the current llm-perf logic from optimum-benchmark as this won't be needed anymore |
awesome ! and don't hesitate to ping me for reviews if needed. |
This PR refactors the llm-perf leaderboard logic to be more extensible and allows adding more hardware benchmarks without code duplication:
Main changes: