Insights not generated #133

ben-manes · 2022-04-30T05:04:31Z

The only hint is that the completion step reports Source code overwritten errors, which I think is due to CI recompiling to remove debug symbols the artifact.

The text was updated successfully, but these errors were encountered:

varunsh-coder · 2022-04-30T05:16:31Z

Thanks for reporting @ben-manes. I’m looking into it.

ben-manes · 2022-04-30T05:18:08Z

Thanks @varunsh-coder. It was a matrix run and I noticed both generated the same insights url, so maybe that confused it? (enabled on Caffeine Tests 11 and 17)

varunsh-coder · 2022-04-30T05:33:12Z

@ben-manes it generates a single insights URL per workflow run, so that was not the issue. I re-ran the analysis and it worked fine. You can see the insights here: https://app.stepsecurity.io/github/ben-manes/caffeine/actions/runs/2248437529

I will investigate why it did not generate it the first time. I think it was a transient error fetching workflow details from GitHub API.

Thanks for trying out harden-runner! I did notice the large number of annotations for source code overwritten. I am trying to figure out a good user experience to disable those if it is a false positive. Let me know if you have ideas. Thanks!

ben-manes · 2022-04-30T05:34:24Z

okay, awesome! You are welcome to close this then 🙂

ben-manes · 2022-04-30T05:45:18Z

hmm.. after trying a run with the suggested block list it fails with connection refused

https://github.com/ben-manes/caffeine/runs/6238249024?check_suite_focus=true

ben-manes · 2022-04-30T05:49:55Z

Appended to the list, trying again
https://github.com/ben-manes/caffeine/runs/6238294593?check_suite_focus=true

varunsh-coder · 2022-04-30T06:05:47Z

Appended to the list, trying again
https://github.com/ben-manes/caffeine/runs/6238294593?check_suite_focus=true

api.azul.com did not show in the insights because of a known bug - where two domains resolve to same IP. It was resolved in the first run.
https://github.com/ben-manes/caffeine/runs/6236868734?check_suite_focus=true#step:26:30

services.gradle.org was not resolved in the first run. That one is odd.

You will need to add couple more I think. These were part of caffeine tests (17) but were not correlated with the right steps, so did not show up in the recommended policy. Will look into this.
coveralls.io:443
sonarcloud.io:443

ben-manes · 2022-04-30T06:08:02Z

done, thank you!
(2 hour builds unfortunately)

https://github.com/ben-manes/caffeine/actions/runs/2249141656

varunsh-coder · 2022-04-30T13:26:32Z

done, thank you! (2 hour builds unfortunately)

https://github.com/ben-manes/caffeine/actions/runs/2249141656

Looks like this one ran fine and insights were also generated. I have created separate issues to track findings from these runs.

ben-manes · 2022-04-30T14:28:43Z

Perfect, thanks!

varunsh-coder · 2022-05-01T00:22:33Z

Hi @ben-manes I am observing the runs and it looks like there were more endpoints called in master, so you had to adjust.
You should not need gheus1ubt20cus1diag.blob.core.windows.net:443 and it anyways changes with each run. So if you get a chance, you can remove that one.

I am also thinking of adding option of check status in audit mode and store allowed endpoints on the backend, so developers can more easily approve endpoints. This is feedback from one of the users. Let me know if you can think of other ideas to improve the experience. Thanks!

ben-manes · 2022-05-01T04:27:52Z

thanks, fixed! The master build is a little different since it publishes a snapshot dependency, which I had overlooked.

That sounds like a good idea to me. I do like your approach of having it very explicitly as part of the configuration, but it is not a reusable list across workflow files. The backend approach would promote setting up this action on all cases and not just the core build (in my case, static analyzers). I also did not put it on the release workflow out of fear that it would break it. The downside is that it requires some login which for an OSS project could be lost if that maintainer disappears.

Overall a very easy and nice security enhancement - thank you! I have no complaints. 🙂

varunsh-coder · 2022-05-01T14:15:50Z

thanks, fixed! The master build is a little different since it publishes a snapshot dependency, which I had overlooked.

That sounds like a good idea to me. I do like your approach of having it very explicitly as part of the configuration, but it is not a reusable list across workflow files. The backend approach would promote setting up this action on all cases and not just the core build (in my case, static analyzers). I also did not put it on the release workflow out of fear that it would break it. The downside is that it requires some login which for an OSS project could be lost if that maintainer disappears.

Overall a very easy and nice security enhancement - thank you! I have no complaints. 🙂

Thanks for the feedback. Your comment about not being reusable across workflows reminded be of this idea, which is to store allowed endpoints in a separate file in the repo, e.g. .github/harder-runner.yml. That could also be a way to disable source code overwrite warnings for some workflows/ jobs.

BTW, are the source code overwrites in the core build job expected? How can one differentiate between those overwrites and a malicious overwrite by ones of the steps? Would you expect the same/ similar overwrites in the release job as well?

ben-manes · 2022-05-01T17:37:46Z

This feels like a problem that there should be an idiomatic solution for. What about composite actions or reusable workflows so that users define their own variant of harden-runner configured with their preferred defaults? This would avoid any work on your side except for documenting the pattern.

The source code overwrite is a hack for my project and not idiomatic, so I think you're okay as is. I use a code generator to create classes for every configuration to favor reducing the memory footprint at the cost of a larger dependency (as unloaded classes are free at runtime). For example a cache without any expiration setting doesn't need a per-entry expiration timestamp field. These code generated classes are compiled without debug symbols as simple data holders, which shaves off ~150kb without impacting debuggability. The hack is that the code is compiled once in full and then the subset compiled again without debug, and those classes replaced. A better approach would be to make these independent compilation and combine them later, but that requires teasing them apart and not being so lazy with this ad hoc approach. In that respect your warnings are a good reminder that the build should be improved, as that also busts the build's cache and can cause it to rerun those tasks if invoked later.

ben-manes · 2022-05-01T19:42:59Z

work in progress, but this clean up commit should avoid those overwrites. Oddly the compileCodeGenJava task is still being invalidated when used locally, so trying to figure out why it is not cached as expected.

ben-manes · 2022-05-01T22:25:01Z

Okay, the build is improved this should be in place. The prior case of a rebuild is because the compiler itself depends on the cache for its AST and the build system incorrectly rewrote that to a project dependency after the jar was built.

However looking at a corrected build and despite the tasks reported as UP-TO-DATE (thus no work), the runner claims the generated files were overwritten. I don't have an explanation for this behavior.

varunsh-coder · 2022-05-01T22:42:25Z

Okay, the build is improved this should be in place. The prior case of a rebuild is because the compiler itself depends on the cache for its AST and the build system incorrectly rewrote that to a project dependency after the jar was built.

However looking at a corrected build and despite the tasks reported as UP-TO-DATE (thus no work), the runner claims the generated files were overwritten. I don't have an explanation for this behavior.

I am making updates to correlate the file overwrite event with the step/ line that made the change based on timestamp. Once that is done, it will be visible in the insights website and should help with understanding why this is happening.

varunsh-coder · 2022-05-01T22:45:01Z

This feels like a problem that there should be an idiomatic solution for. What about composite actions or reusable workflows so that users define their own variant of harden-runner configured with their preferred defaults? This would avoid any work on your side except for documenting the pattern.

The source code overwrite is a hack for my project and not idiomatic, so I think you're okay as is. I use a code generator to create classes for every configuration to favor reducing the memory footprint at the cost of a larger dependency (as unloaded classes are free at runtime). For example a cache without any expiration setting doesn't need a per-entry expiration timestamp field. These code generated classes are compiled without debug symbols as simple data holders, which shaves off ~150kb without impacting debuggability. The hack is that the code is compiled once in full and then the subset compiled again without debug, and those classes replaced. A better approach would be to make these independent compilation and combine them later, but that requires teasing them apart and not being so lazy with this ad hoc approach. In that respect your warnings are a good reminder that the build should be improved, as that also busts the build's cache and can cause it to rerun those tasks if invoked later.

Composite and reusable workflows are interesting options to solve this. I think composite workflows might create friction for onboarding and developers will likely prefer an easier option. Reusable workflows could work, I need to learn more about that. Will investigate...

ben-manes · 2022-05-01T23:08:37Z

I am making updates to correlate the file overwrite event with the step/ line that made the change based on timestamp. Once that is done, it will be visible in the insights website and should help with understanding why this is happening.

That will be very useful, thanks!

A guess is that it runs the codegen task but observes that the output hasn't changed via hashing, and determines it is uptodate so it does not invalidate downstream tasks. Since the task is so inexpensive it would appear otherwise to have not done any work. That wouldn't be the very obvious behavior, but I could see it making sense as a tooling author as an optimization, differentiated by the SKIPPED if irrelevant (but actually means something else, but oh well). The tasks themselves don't indicate anything regarding how to determine if they are uptodate. There might be some task input I need to set to get it to avoid regenerating, if my hypothesis is correct.

ben-manes · 2022-05-01T23:25:10Z

stats doesn't indicate any changes when running locally, and a scan does say it has savings but also that it is not cachable, so not sure.

ben-manes · 2022-05-02T00:50:13Z

I got it. Fixing the task caching and now the runner does not report any overwritten files. This is a nice set of hygienic build cleanups, so I think it was a good prompt.

varunsh-coder · 2022-05-02T01:30:39Z

I got it. Fixing the task caching and now the runner does not report any overwritten files. This is a nice set of hygienic build cleanups, so I think it was a good prompt.

Great! Thanks for the feedback!

varunsh-coder · 2022-05-02T13:25:15Z

I am making updates to correlate the file overwrite event with the step/ line that made the change based on timestamp. Once that is done, it will be visible in the insights website and should help with understanding why this is happening.

That will be very useful, thanks!

A guess is that it runs the codegen task but observes that the output hasn't changed via hashing, and determines it is uptodate so it does not invalidate downstream tasks. Since the task is so inexpensive it would appear otherwise to have not done any work. That wouldn't be the very obvious behavior, but I could see it making sense as a tooling author as an optimization, differentiated by the SKIPPED if irrelevant (but actually means something else, but oh well). The tasks themselves don't indicate anything regarding how to determine if they are uptodate. There might be some task input I need to set to get it to avoid regenerating, if my hypothesis is correct.

I have not implemented the UI yet, but based on the logs, the overwrite is in this line.

Mon, 02 May 2022 08:41:38 GMT > Task :guava:compileTestJava UP-TO-DATE

The timestamp of overwrite is logged in Post Harden Runner step here.

[Source code overwritten] file: /home/runner/work/caffeine/caffeine/simulator/build/generated/sources/annotationProcessor/java/main/com/github/benmanes/caffeine/cache/simulator/report/AutoValue_Metrics.java syscall: openat by exe: /opt/hostedtoolcache/Java_Zulu_jdk/17.0.3-7/x64/bin/java [7e3d556664b0ff8f12650d53f957476669df45c99c790e8eb8eade18b55ed596] **Timestamp: 2022-05-02T08:41:42.485Z**

ben-manes · 2022-05-02T15:17:13Z

It is actually the next line,

Task :simulator:compileJava

I dug into that last night and it is Google AutoValue breaking incremental compilation. While I’m using a fixed version since they don’t have tests it’s likely a regression.
google/auto#1075

ben-manes · 2022-05-02T17:59:20Z

Opened a ticket google/auto#1315

ben-manes · 2022-05-04T15:09:57Z

FYI this is fixed too (and my fault, of course). It was a really nice experience setting this up and fixing the issues highlighted. 😊

varunsh-coder · 2022-05-04T19:13:55Z

FYI this is fixed too (and my fault, of course). It was a really nice experience setting this up and fixing the issues highlighted. 😊

Thanks for the update! I am curious if the exact step was Task :simulator:compileJava or Task :guava:compileTestJava UP-TO-DATE. I am still not sure why the correlation based on time is not accurate. The file write was reported at Task :guava:compileTestJava UP-TO-DATE.

ben-manes · 2022-05-04T19:47:20Z

It was definitely the :simulator:compileJava task. Are you judging by epoch (seconds) resolution? There could easily be a race as the up-to-date check would be instant, so even millis might be too coarse.

varunsh-coder · 2022-05-04T20:00:48Z

It was definitely the :simulator:compileJava task. Are you judging by epoch (seconds) resolution? There could easily be a race as the up-to-date check would be instant, so even millis might be too coarse.

As per the downloaded log archive, which has better precision, these are the timestamps:
2022-05-02T08:41:38.6610262Z > Task :guava:compileTestJava UP-TO-DATE
2022-05-02T08:41:53.3611024Z > Task :simulator:compileJava

As per harden-runner log, the first file overwrite was at 2022-05-02T08:41:42.485Z. So as per this info, Task :simulator:compileJava had not started at that time. Or may be workflow logs the Task after task is completed.

ben-manes · 2022-05-04T20:03:31Z

oh yes, Gradle displays it after it completes for TERM=dumb mode of a CI. In a rich terminal it is an updated progress line.

varunsh-coder · 2022-05-04T21:41:20Z

oh yes, Gradle displays it after it completes for TERM=dumb mode of a CI. In a rich terminal it is an updated progress line.

ok, that explains it. Thanks!

This was referenced Apr 30, 2022

Retry calls to GitHub API when generating insights #134

Closed

Outbound calls not correlated with the right step #135

Closed

Outbound domain not included if resolved to same IP as another domain #136

Closed

ben-manes closed this as completed Apr 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Insights not generated #133

Insights not generated #133

ben-manes commented Apr 30, 2022

varunsh-coder commented Apr 30, 2022

ben-manes commented Apr 30, 2022

varunsh-coder commented Apr 30, 2022

ben-manes commented Apr 30, 2022

ben-manes commented Apr 30, 2022

ben-manes commented Apr 30, 2022

varunsh-coder commented Apr 30, 2022 •

edited

Loading

ben-manes commented Apr 30, 2022

varunsh-coder commented Apr 30, 2022

ben-manes commented Apr 30, 2022

varunsh-coder commented May 1, 2022

ben-manes commented May 1, 2022

varunsh-coder commented May 1, 2022

ben-manes commented May 1, 2022 •

edited

Loading

ben-manes commented May 1, 2022

ben-manes commented May 1, 2022

varunsh-coder commented May 1, 2022

varunsh-coder commented May 1, 2022

ben-manes commented May 1, 2022

ben-manes commented May 1, 2022

ben-manes commented May 2, 2022

varunsh-coder commented May 2, 2022

varunsh-coder commented May 2, 2022

ben-manes commented May 2, 2022

ben-manes commented May 2, 2022

ben-manes commented May 4, 2022

varunsh-coder commented May 4, 2022

ben-manes commented May 4, 2022

varunsh-coder commented May 4, 2022

ben-manes commented May 4, 2022

varunsh-coder commented May 4, 2022

Insights not generated #133

Insights not generated #133

Comments

ben-manes commented Apr 30, 2022

varunsh-coder commented Apr 30, 2022

ben-manes commented Apr 30, 2022

varunsh-coder commented Apr 30, 2022

ben-manes commented Apr 30, 2022

ben-manes commented Apr 30, 2022

ben-manes commented Apr 30, 2022

varunsh-coder commented Apr 30, 2022 • edited Loading

ben-manes commented Apr 30, 2022

varunsh-coder commented Apr 30, 2022

ben-manes commented Apr 30, 2022

varunsh-coder commented May 1, 2022

ben-manes commented May 1, 2022

varunsh-coder commented May 1, 2022

ben-manes commented May 1, 2022 • edited Loading

ben-manes commented May 1, 2022

ben-manes commented May 1, 2022

varunsh-coder commented May 1, 2022

varunsh-coder commented May 1, 2022

ben-manes commented May 1, 2022

ben-manes commented May 1, 2022

ben-manes commented May 2, 2022

varunsh-coder commented May 2, 2022

varunsh-coder commented May 2, 2022

ben-manes commented May 2, 2022

ben-manes commented May 2, 2022

ben-manes commented May 4, 2022

varunsh-coder commented May 4, 2022

ben-manes commented May 4, 2022

varunsh-coder commented May 4, 2022

ben-manes commented May 4, 2022

varunsh-coder commented May 4, 2022

varunsh-coder commented Apr 30, 2022 •

edited

Loading

ben-manes commented May 1, 2022 •

edited

Loading