Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scoring logging bug 🐛 - incorrect computation of time to target on val (s) in get_summary_df #791

Closed
Niccolo-Ajroldi opened this issue Oct 11, 2024 · 2 comments · Fixed by #792

Comments

@Niccolo-Ajroldi
Copy link
Contributor

Niccolo-Ajroldi commented Oct 11, 2024

Description

In scoring/score_submissions.py, the function get_summary_df is responsible for gathering evaluation statistics from submission logs into a DataFrame. When scoring a submission, this function is invoked on every workload, and the resulting concatenated DataFrames are saved as CSV files (<submission>_summary.csv ).

The current implementation computes the time needed to reach the validation target as follows:

summary_df['time to target on val (s)'] = summary_df.apply(
lambda x: x['time to best eval on val (s)']
if x['val target reached'] else np.inf,
axis=1)

This results in a time to target on val (s) equal to time to best eval on val (s) if a submission reaches the target. However, usually the time to the validation target is usually lower than the time to best eval score.

Performance profiles are not affected

Fortunately, this bug does not affect the performance profiles, nor the final scores. Despite the concatenated DataFrames are used to compute the performance profiles, fortunately, we ignore the existing time to target on val (s) column and perform instead a correct computation of the time to eval target.

Source or Possible Fix

I have implemented a fix in #792. The final scores and the performance profiles are unaffected after the fix. However, <submission>_summary.csv changes drastically. Here is an example on two workloads for the prize qualification baseline algorithm (first study):

image

@priyakasimbeg
Copy link
Contributor

priyakasimbeg commented Oct 11, 2024

Thank you for identifying this issue and submitting a fix with detailed analysis!
Confirming that this bug does not affect final scoring and performance profiles. The summary_df is only used for logging purposes and does not feed into the scoring pipeline.
Just verified that in the scoring pipeline the time to target is computed correctly here: https://github.com/mlcommons/algorithmic-efficiency/blob/main/scoring/performance_profile.py#L155

@priyakasimbeg priyakasimbeg changed the title Scoring bug 🐛 - incorrect computation of time to target on val (s) in get_summary_df Scoring logging bug 🐛 - incorrect computation of time to target on val (s) in get_summary_df Oct 11, 2024
@priyakasimbeg
Copy link
Contributor

Fixed in #792

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants