Scoring logging bug 🐛 - incorrect computation of `time to target on val (s)` in `get_summary_df` #791

Niccolo-Ajroldi · 2024-10-11T16:09:58Z

Description

In scoring/score_submissions.py, the function get_summary_df is responsible for gathering evaluation statistics from submission logs into a DataFrame. When scoring a submission, this function is invoked on every workload, and the resulting concatenated DataFrames are saved as CSV files (<submission>_summary.csv ).

The current implementation computes the time needed to reach the validation target as follows:

algorithmic-efficiency/scoring/score_submissions.py

Lines 91 to 94 in a23b5ea

    
           summary_df['time to target on val (s)'] = summary_df.apply( 
        
               lambda x: x['time to best eval on val (s)'] 
        
               if x['val target reached'] else np.inf, 
        
               axis=1)

This results in a time to target on val (s) equal to time to best eval on val (s) if a submission reaches the target. However, usually the time to the validation target is usually lower than the time to best eval score.

Performance profiles are not affected

Fortunately, this bug does not affect the performance profiles, nor the final scores. Despite the concatenated DataFrames are used to compute the performance profiles, fortunately, we ignore the existing time to target on val (s) column and perform instead a correct computation of the time to eval target.

Source or Possible Fix

I have implemented a fix in #792. The final scores and the performance profiles are unaffected after the fix. However, <submission>_summary.csv changes drastically. Here is an example on two workloads for the prize qualification baseline algorithm (first study):

The text was updated successfully, but these errors were encountered:

priyakasimbeg · 2024-10-11T19:07:21Z

Thank you for identifying this issue and submitting a fix with detailed analysis!
Confirming that this bug does not affect final scoring and performance profiles. The summary_df is only used for logging purposes and does not feed into the scoring pipeline.
Just verified that in the scoring pipeline the time to target is computed correctly here: https://github.com/mlcommons/algorithmic-efficiency/blob/main/scoring/performance_profile.py#L155

priyakasimbeg · 2024-10-14T23:59:43Z

Fixed in #792

Niccolo-Ajroldi mentioned this issue Oct 11, 2024

Fix scoring bug in get_summary_df #792

Merged

priyakasimbeg changed the title ~~Scoring bug 🐛 - incorrect computation of time to target on val (s) in get_summary_df~~ Scoring logging bug 🐛 - incorrect computation of time to target on val (s) in get_summary_df Oct 11, 2024

priyakasimbeg closed this as completed Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scoring logging bug 🐛 - incorrect computation of `time to target on val (s)` in `get_summary_df` #791

Scoring logging bug 🐛 - incorrect computation of `time to target on val (s)` in `get_summary_df` #791

Niccolo-Ajroldi commented Oct 11, 2024 •

edited

Loading

priyakasimbeg commented Oct 11, 2024 •

edited

Loading

priyakasimbeg commented Oct 14, 2024

Scoring logging bug 🐛 - incorrect computation of time to target on val (s) in get_summary_df #791

Scoring logging bug 🐛 - incorrect computation of time to target on val (s) in get_summary_df #791

Comments

Niccolo-Ajroldi commented Oct 11, 2024 • edited Loading

Description

Performance profiles are not affected

Source or Possible Fix

priyakasimbeg commented Oct 11, 2024 • edited Loading

priyakasimbeg commented Oct 14, 2024

Scoring logging bug 🐛 - incorrect computation of `time to target on val (s)` in `get_summary_df` #791

Scoring logging bug 🐛 - incorrect computation of `time to target on val (s)` in `get_summary_df` #791

Niccolo-Ajroldi commented Oct 11, 2024 •

edited

Loading

priyakasimbeg commented Oct 11, 2024 •

edited

Loading