Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rzChisquared NaN's on CPU Backend for T5 #97

Conversation

GNiendorf
Copy link
Member

@GNiendorf GNiendorf commented Sep 25, 2024

Resolves an issue where on the CPU backend the ntuple would contain NaN values for some rzChisquared t5 values by switching alpaka::math::isnan to edm::isNotFinite.

@GNiendorf GNiendorf linked an issue Sep 25, 2024 that may be closed by this pull request
@GNiendorf
Copy link
Member Author

/run standalone

@GNiendorf
Copy link
Member Author

NaN's still exist in the pT5_rzChiSquared variable, so this is not a complete fix. Those must be coming from a different issue.

Copy link

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     44.3    325.8    115.2     46.8     92.1    504.7    113.6    149.0    100.8      2.4    1494.6     945.7+/- 248.6     409.3   explicit_cache[s=4] (target branch)
   avg     43.9    325.3    115.9     46.8     92.8    501.9    113.8    151.9    102.9      1.9    1497.1     951.3+/- 248.6     409.4   explicit_cache[s=4] (this PR)

@GNiendorf
Copy link
Member Author

// calculation is detailed documented here https://indico.cern.ch/event/1185895/contributions/4982756/attachments/2526561/4345805/helix%20pT3%20summarize.pdf
float diffr, diffz;
float p = alpaka::math::sqrt(acc, Px * Px + Py * Py + Pz * Pz);
float rou = a / p;
if (moduleSubdet == lst::Endcap) {
float s = (zsi - z1) * p / Pz;
float x = x1 + Px / a * alpaka::math::sin(acc, rou * s) - Py / a * (1 - alpaka::math::cos(acc, rou * s));
float y = y1 + Py / a * alpaka::math::sin(acc, rou * s) + Px / a * (1 - alpaka::math::cos(acc, rou * s));
diffr = alpaka::math::abs(acc, rtsi - alpaka::math::sqrt(acc, x * x + y * y)) * 100;
}
if (moduleSubdet == lst::Barrel) {
float paraA = r1 * r1 + 2 * (Px * Px + Py * Py) / (a * a) + 2 * (y1 * Px - x1 * Py) / a - rtsi * rtsi;
float paraB = 2 * (x1 * Px + y1 * Py) / a;
float paraC = 2 * (y1 * Px - x1 * Py) / a + 2 * (Px * Px + Py * Py) / (a * a);
float A = paraB * paraB + paraC * paraC;
float B = 2 * paraA * paraB;
float C = paraA * paraA - paraC * paraC;
float sol1 = (-B + alpaka::math::sqrt(acc, B * B - 4 * A * C)) / (2 * A);
float sol2 = (-B - alpaka::math::sqrt(acc, B * B - 4 * A * C)) / (2 * A);
float solz1 = alpaka::math::asin(acc, sol1) / rou * Pz / p + z1;
float solz2 = alpaka::math::asin(acc, sol2) / rou * Pz / p + z1;
float diffz1 = alpaka::math::abs(acc, solz1 - zsi) * 100;
float diffz2 = alpaka::math::abs(acc, solz2 - zsi) * 100;
diffz = alpaka::math::min(acc, diffz1, diffz2);
}

// calculation is copied from PixelTriplet.cc lst::computePT3RZChiSquared
float diffr = 0, diffz = 0;
float rou = a / p;
// for endcap
float s = (zsi - z_init) * p / Pz;
float x = x_init + Px / a * alpaka::math::sin(acc, rou * s) - Py / a * (1 - alpaka::math::cos(acc, rou * s));
float y = y_init + Py / a * alpaka::math::sin(acc, rou * s) + Px / a * (1 - alpaka::math::cos(acc, rou * s));
diffr = (rtsi - alpaka::math::sqrt(acc, x * x + y * y)) * 100;
// for barrel
if (layeri <= 6) {
float paraA =
rt_init * rt_init + 2 * (Px * Px + Py * Py) / (a * a) + 2 * (y_init * Px - x_init * Py) / a - rtsi * rtsi;
float paraB = 2 * (x_init * Px + y_init * Py) / a;
float paraC = 2 * (y_init * Px - x_init * Py) / a + 2 * (Px * Px + Py * Py) / (a * a);
float A = paraB * paraB + paraC * paraC;
float B = 2 * paraA * paraB;
float C = paraA * paraA - paraC * paraC;
float sol1 = (-B + alpaka::math::sqrt(acc, B * B - 4 * A * C)) / (2 * A);
float sol2 = (-B - alpaka::math::sqrt(acc, B * B - 4 * A * C)) / (2 * A);
float solz1 = alpaka::math::asin(acc, sol1) / rou * Pz / p + z_init;
float solz2 = alpaka::math::asin(acc, sol2) / rou * Pz / p + z_init;
float diffz1 = (solz1 - zsi) * 100;
float diffz2 = (solz2 - zsi) * 100;
// Alpaka : Needs to be moved over
if (alpaka::math::isnan(acc, diffz1))
diffz = diffz2;
else if (alpaka::math::isnan(acc, diffz2))
diffz = diffz1;
else {
diffz = (alpaka::math::abs(acc, diffz1) < alpaka::math::abs(acc, diffz2)) ? diffz1 : diffz2;
}
}

// calculation is detailed documented here https://indico.cern.ch/event/1185895/contributions/4982756/attachments/2526561/4345805/helix%20pT3%20summarize.pdf
float diffr, diffz;
float p = alpaka::math::sqrt(acc, Px * Px + Py * Py + Pz * Pz);
float rou = a / p;
if (moduleSubdet == lst::Endcap) {
float s = (zsi - z1) * p / Pz;
float x = x1 + Px / a * alpaka::math::sin(acc, rou * s) - Py / a * (1 - alpaka::math::cos(acc, rou * s));
float y = y1 + Py / a * alpaka::math::sin(acc, rou * s) + Px / a * (1 - alpaka::math::cos(acc, rou * s));
diffr = alpaka::math::abs(acc, rtsi - alpaka::math::sqrt(acc, x * x + y * y)) * 100;
residual = diffr;
}
if (moduleSubdet == lst::Barrel) {
float paraA = r1 * r1 + 2 * (Px * Px + Py * Py) / (a * a) + 2 * (y1 * Px - x1 * Py) / a - rtsi * rtsi;
float paraB = 2 * (x1 * Px + y1 * Py) / a;
float paraC = 2 * (y1 * Px - x1 * Py) / a + 2 * (Px * Px + Py * Py) / (a * a);
float A = paraB * paraB + paraC * paraC;
float B = 2 * paraA * paraB;
float C = paraA * paraA - paraC * paraC;
float sol1 = (-B + alpaka::math::sqrt(acc, B * B - 4 * A * C)) / (2 * A);
float sol2 = (-B - alpaka::math::sqrt(acc, B * B - 4 * A * C)) / (2 * A);
float solz1 = alpaka::math::asin(acc, sol1) / rou * Pz / p + z1;
float solz2 = alpaka::math::asin(acc, sol2) / rou * Pz / p + z1;
float diffz1 = alpaka::math::abs(acc, solz1 - zsi) * 100;
float diffz2 = alpaka::math::abs(acc, solz2 - zsi) * 100;
diffz = alpaka::math::min(acc, diffz1, diffz2);
residual = diffz;
}

Looks like this calculation is duplicated three times in the code, two without the NaN checks and one with the NaN checks. I guess the pT5's calculation needs the NaN checks? @YonsiG

@YonsiG
Copy link

YonsiG commented Sep 26, 2024

In principal they all need NaN checks. thank you for the fix of edm::isNotFinite! I think for the pT5, we have a special treatment and only apply rzchi2 cut for <50GeV tracks. since NaN usually appears in high pt tracks cannot have radius calculated, in pT5 I didn't specify this check.

@slava77
Copy link

slava77 commented Sep 27, 2024

/run all

Copy link

There was a problem while building and running in standalone mode. The logs can be found here.

@ariostas
Copy link
Member

It failed because it tried to upload the plots at the same time as another job. We can just restart the job in a bit.

@slava77
Copy link

slava77 commented Sep 27, 2024

linter is not happy.
@GNiendorf did you run code-checks in 14_2_X?

Copy link

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@GNiendorf
Copy link
Member Author

linter is not happy. @GNiendorf did you run code-checks in 14_2_X?

See comment on the other PR, it suggests to format files that are not apart of these PR's

#98 (comment)

@slava77
Copy link

slava77 commented Sep 27, 2024

/run all

Copy link

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     45.2    322.8    115.7     47.1     91.7    549.0    111.7    150.4    106.6      1.8    1542.0     947.8+/- 249.5     419.7   explicit_cache[s=4] (target branch)
   avg     44.7    326.3    117.8     48.6     93.9    508.2    113.1    150.5    102.2      3.1    1508.5     955.5+/- 243.3     413.3   explicit_cache[s=4] (this PR)

Copy link

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     44.6    320.9    116.7     48.2     93.2    549.6    111.5    150.9    106.3      1.7    1543.6     949.5+/- 248.8     424.0   explicit_cache[s=4] (target branch)
   avg     45.0    326.3    116.9     47.2     92.1    506.3    113.4    152.6    100.7      2.0    1502.5     951.2+/- 248.2     409.7   explicit_cache[s=4] (this PR)

@slava77
Copy link

slava77 commented Sep 27, 2024

Here is a timing comparison:

pLS column went down by about 10% in two repeated tests from around 550

Seems unrelated, but oddly consistent, unclear why.

Copy link

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

Copy link

@slava77 slava77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may help to add in the PR title that the fix is only in T5s

@slava77 slava77 merged commit 23d7a92 into CMSSW_14_1_0_pre3_LST_X_LSTCore_realfiles_batch1_devel Sep 28, 2024
2 of 3 checks passed
@slava77
Copy link

slava77 commented Sep 28, 2024

similar to a few other recent PRs, the leftover linter issues are in the code not touched by this PR

@GNiendorf GNiendorf changed the title Fix rzChisquared NaN's on CPU Backend Fix rzChisquared NaN's on CPU Backend for T5 Sep 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NaN Values in rzChiSquared Variables
4 participants