-
Notifications
You must be signed in to change notification settings - Fork 879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add performance benchmarking scripts and run in CI #1978
Conversation
Few less replications, some more seeds. Every benchmark now takes between 10 and 20 seconds (on my machine).
That allows switching branches without benchmarks results disappearing
Prints some stuff when running and saves a pickle file after running.
- The bootstrap_speedup_confidence_interval function calculates the mean speedup and its confidence interval using bootstrapping, which is more suitable for paired data. - The mean speedup and confidence interval are calculated for both initialization and run times. - Positive values indicate an increase in time (longer duration), and negative values indicate a decrease (shorter duration). - The results are displayed in a DataFrame with the percentage changes and their confidence intervals.
/rerun-benchmarks |
/rerun-benchmarks Edit: https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#issue_comment
|
(I'm truly sorry if someone set email notifications for "ready for review") |
I think there is a vscode extension that lets you run GitHub actions locally, maybe give it a try? |
I don't, but why are you doing this? To trigger GA actions? |
Unfortunately, I'm 90% sure. The error:
implies |
Okay, there are a few minor things, like Black and ruff failing because the added benchmark models don't adhere to their standards, but the benchmark scripts are further ready. General usage (once this is merged to main):
It should output a table like this:
Positive indicates increate in runtime, negative a decrease. I will add fancy emojis tomorrow to make it really clear if it's good, bad or insignificant. |
I think you can set the permission inside the workflow file similar to this https://github.com/projectmesa/mesa/blob/main/.github%2Fworkflows%2Frelease.yml#L18-L20 |
According to ChatGPT it's something different:
|
That was just meant as an example. I would guess you need |
If the 95% confidence interval is: - fully below -3%: 🟢 - fully above +3%: 🔴 - else: 🔵
You never knows what helps
@Corvince Thanks for your insights. They're really useful. I think I zoomed further in to the issue: https://docs.github.com/en/actions/security-guides/automatic-token-authentication#permissions-for-the-github_token Check the column "Maximum access for pull requests from public forked repositories". Everything is I'm pulling this PR in from my own fork EwoutH/mesa. Many contributors will do that, so this workflow has to work from public forks. But I understand very clearly why it's disabled by default, otherwise anyone can open any PR and do anything. So the solution stays the same as suggested here, we need to make a new very-specific, but permissive token with "pull-requests: write" access that only runs on this workflow and only is allowed to make pull request comments. |
Weird stuff. Can't figure it out without access to tokens and secrets unfortunately. Will wait until I have the permissions. |
@EwoutH Do you mind if I commit to this branch with various minor updates?
|
Go ahead, and thanks for asking. With #1979 merged I just wanted to rebase it on that PR, shall I do that first? |
I'm going to do a bit of weekend, I will take a looks at the CI tomorrow or Monday again. (@jackiekazil or @tpike3 I would greatly appreciate if I have the permissions by then) |
I see a major performance difference between this branch and master (factor 5 or so). I want to dig in and see which commit explains this difference. Will keep you posted. |
In runtime? Quite sure it’s this one bc61345, for this branch I set the replications to 1 to iterate faster. |
Should this one be closed in lieu of the other PR? |
Let's see how many attempts this is going to take
General usage (once this is merged to main):
global_benchmarks.py
global_benchmarks.py
againcompare_timings.py
It should output a table like this:
Positive indicates increate in runtime, negative a decrease.