-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use probability sampling over periodic sampling #213
Conversation
The following two outputs show that the sampling with probability 1.0% works properly when applied to kernel timer Kokkos tool for stream benchmark in the Kokkos core benchmark folder. The first output is with Kokkos tools global fences being on (tool-induced fencing is enabled) and the second output is with global fencing turned off. In the second case, fencing is not invoked, as expected. Also, note that the set of Kokkos kernel invocation numbers that is sampled is different across these two different runs. The random number generator is seeded with the current time, making the invocations sampled different. The following is run on a MacOS with gcc and Kokkos 4.1.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add a user option to set the seed, and don't delete the erase
Erase got deleted (there on the develop branch) Took it out due to copy-paste
User can input seed for RNG for probabilistic sampling
The below is a test with the most recently committed version with KOKKOS_TOOLS_SEED set, as requested by @crtrott. Two separate runs of the program stream were done with the following environment variables set. As seen from the output of both runs, they both have the same sequence of events sampled, showing that the manual seed rather than time-generated seed is working for this case. Note that the output is truncated for easily viewing the output within this GitHub issue.
|
Done. |
…or seed for user input + README
This PR now has tests for the probability sampling and is rebased with develop. |
Fixing inconsistencies of diff with develop that shouldn't be there. These changes do not belong to this PR. They changes comprise primarily of minor changes with printed output. These unintended changes that I have taken out don't change any core logic or control flow in the program.
…riable should refer to that variable and not sampler skip
"sampling were set...\n"; | ||
} | ||
tool_prob_num = 10.0; | ||
kernelSampleSkip = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You hard code both tool_prob_num and kernelSampleSkip ... the user settings are ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put the entire thing into a if (tool_prob_num == -1.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
I think the key suggestion that handles the issue raised is handled by the single one you suggested in the end of this review.
We might want to catch the case when tool_prob_num is negative but not -1.0 - I think that should happen only if there is some non-deterministic bit flip like event that causes tool_prob_num
to be negative.
Note I have now also put comments for each possible user input and how it is handled.
<< " percent. The skip rate for sampler will not be used.\n"; | ||
} | ||
|
||
kernelSampleSkip = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And again setting sampleskip ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted this.
This is a redundant instruction in the program.
For later: we may want to consider what to do if, e.g., a cosmic ray flips a bit in the environment variable (I think we aren't trying to build resilience in this sampler).
|
||
const char* tool_sample = getenv("KOKKOS_TOOLS_SAMPLER_SKIP"); | ||
if (NULL != tool_sample) { | ||
tool_prob_num = 100.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this means SamplerSkip is prioritized over random setting, because it overwrites the previous setting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is fixed now.
<< tool_prob_num << "\n"; | ||
} | ||
kernelSampleSkip = 1; | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see return, but none of the other stuff has returns ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crtrott Please see updated version. I think your fix suggested below takes care of this.
Fix from a previous suggestion from contributed commit - it may have been misunderstood that probability sampling is the priority here (I should have caught this). Co-authored-by: Christian Trott <crtrott@sandia.gov>
I put comments to make clear the desired behavior. Note that the conditional shouldn't be needed.
Fix #180 .
This PR is related to old PR #181
The current Kokkos sampler utility uses periodic sampling via a sampler skip rate. Doing this is often restrictive when sampling profiling and debugging data. For example, doing this can miss out on important data not in the periodicity of kernel invocations. The goal of this PR for the Kokkos sampler utility is to allow user to use random sampling primarily and periodic sampling secondarily via environment variables in the form
KOKKOS_TOOLS_SAMPLER_xyz
.Since the solution should not allow for a combination of both periodicity and probability, the probability will always be chosen.
For example, let us say that a user requests a a
Kokkos::parallel_for()
every 20th invocation of thatKokkos::parallel_for()
and requests gather time spent onKokkos::parallel_for()
with probability 63% on each invocation of thatKokkos::parallel_for()
. Then, the sampler will not skip trying to time any invocations of the Kokkos::parallel_for() but it will obtain a timing with probability 63% on each invocation of thatKokkos::parallel_for()
.See the
common/kokkos-sampler/README.md
directory for a high-level overview - in English - of the changes.For later: I will put in slide in the Kokkos Tools tutorial slide on sampling and filtering to explain how to use these utilities.