How to avoid using for loop in these kind of situations #1500
-
Hi experts,
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 7 replies
-
Well, first thing: the uproot.TBranch.array method reads data and you don't want to read and re-read the same data over and over. You could pass a signal_xgb = event_signal_xgb['DiphotonMVA_self'].array(library = 'np')
bkg_pp_xgb = event_bkg_pp_xgb['DiphotonMVA_self'].array(library = 'np')
bkg_pfff_xgb = event_bkg_pfff_xgb['DiphotonMVA_self'].array(library = 'np') read commands out of the loop and just use those arrays, rather than reading them, in the loop. Secondly, you can remove the sums that don't depend on signal_xgb_weights_sum = event_signal_xgb_weights.sum()
bkg_pp_xgb_weights_sum = event_bkg_pp_xgb_weights.sum()
bkg_pfff_xgb_weights_sum = event_bkg_pfff_xgb_weights.sum() What remains is a for loop over 1000 values of So first, try the fixes described above and if it's still a problem, you might want to try using np.sort, np.cumsum and/or np.searchsorted. You want to add up everything in an array that's greater than a given All while you are making these incremental improvements, check the results of the optimized code against the results you already have with the slow code. Don't try to do all the improvements at once—small steps with correctness-checking at each step (i.e. "test driven development")—will likely involve less debugging. |
Beta Was this translation helpful? Give feedback.
Well, first thing: the uproot.TBranch.array method reads data and you don't want to read and re-read the same data over and over. You could pass a
cache
to thearray
method so that it would check thecache
for an already-read value instead of getting it from the file, but it's probably easier to just move theread commands out of the loop and just use those arrays, rather than reading them, in the loop.
Secondly, you can remove the sums that don't depend on
i
outside the l…