Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insights not generated #133

Closed
ben-manes opened this issue Apr 30, 2022 · 31 comments
Closed

Insights not generated #133

ben-manes opened this issue Apr 30, 2022 · 31 comments

Comments

@ben-manes
Copy link

The only hint is that the completion step reports Source code overwritten errors, which I think is due to CI recompiling to remove debug symbols the artifact.

@varunsh-coder
Copy link
Member

Thanks for reporting @ben-manes. I’m looking into it.

@ben-manes
Copy link
Author

Thanks @varunsh-coder. It was a matrix run and I noticed both generated the same insights url, so maybe that confused it? (enabled on Caffeine Tests 11 and 17)

@varunsh-coder
Copy link
Member

@ben-manes it generates a single insights URL per workflow run, so that was not the issue. I re-ran the analysis and it worked fine. You can see the insights here: https://app.stepsecurity.io/github/ben-manes/caffeine/actions/runs/2248437529

I will investigate why it did not generate it the first time. I think it was a transient error fetching workflow details from GitHub API.

Thanks for trying out harden-runner! I did notice the large number of annotations for source code overwritten. I am trying to figure out a good user experience to disable those if it is a false positive. Let me know if you have ideas. Thanks!

@ben-manes
Copy link
Author

okay, awesome! You are welcome to close this then 🙂

@ben-manes
Copy link
Author

hmm.. after trying a run with the suggested block list it fails with connection refused

https://github.com/ben-manes/caffeine/runs/6238249024?check_suite_focus=true

@ben-manes
Copy link
Author

Appended to the list, trying again
https://github.com/ben-manes/caffeine/runs/6238294593?check_suite_focus=true

@varunsh-coder
Copy link
Member

varunsh-coder commented Apr 30, 2022

Appended to the list, trying again
https://github.com/ben-manes/caffeine/runs/6238294593?check_suite_focus=true

api.azul.com did not show in the insights because of a known bug - where two domains resolve to same IP. It was resolved in the first run.
https://github.com/ben-manes/caffeine/runs/6236868734?check_suite_focus=true#step:26:30

services.gradle.org was not resolved in the first run. That one is odd.

You will need to add couple more I think. These were part of caffeine tests (17) but were not correlated with the right steps, so did not show up in the recommended policy. Will look into this.
coveralls.io:443
sonarcloud.io:443

@ben-manes
Copy link
Author

done, thank you!
(2 hour builds unfortunately)

https://github.com/ben-manes/caffeine/actions/runs/2249141656

@varunsh-coder
Copy link
Member

done, thank you! (2 hour builds unfortunately)

https://github.com/ben-manes/caffeine/actions/runs/2249141656

Looks like this one ran fine and insights were also generated. I have created separate issues to track findings from these runs.

@ben-manes
Copy link
Author

Perfect, thanks!

@varunsh-coder
Copy link
Member

Hi @ben-manes I am observing the runs and it looks like there were more endpoints called in master, so you had to adjust.
You should not need gheus1ubt20cus1diag.blob.core.windows.net:443 and it anyways changes with each run. So if you get a chance, you can remove that one.

I am also thinking of adding option of check status in audit mode and store allowed endpoints on the backend, so developers can more easily approve endpoints. This is feedback from one of the users. Let me know if you can think of other ideas to improve the experience. Thanks!

@ben-manes
Copy link
Author

thanks, fixed! The master build is a little different since it publishes a snapshot dependency, which I had overlooked.

That sounds like a good idea to me. I do like your approach of having it very explicitly as part of the configuration, but it is not a reusable list across workflow files. The backend approach would promote setting up this action on all cases and not just the core build (in my case, static analyzers). I also did not put it on the release workflow out of fear that it would break it. The downside is that it requires some login which for an OSS project could be lost if that maintainer disappears.

Overall a very easy and nice security enhancement - thank you! I have no complaints. 🙂

@varunsh-coder
Copy link
Member

thanks, fixed! The master build is a little different since it publishes a snapshot dependency, which I had overlooked.

That sounds like a good idea to me. I do like your approach of having it very explicitly as part of the configuration, but it is not a reusable list across workflow files. The backend approach would promote setting up this action on all cases and not just the core build (in my case, static analyzers). I also did not put it on the release workflow out of fear that it would break it. The downside is that it requires some login which for an OSS project could be lost if that maintainer disappears.

Overall a very easy and nice security enhancement - thank you! I have no complaints. 🙂

Thanks for the feedback. Your comment about not being reusable across workflows reminded be of this idea, which is to store allowed endpoints in a separate file in the repo, e.g. .github/harder-runner.yml. That could also be a way to disable source code overwrite warnings for some workflows/ jobs.

BTW, are the source code overwrites in the core build job expected? How can one differentiate between those overwrites and a malicious overwrite by ones of the steps? Would you expect the same/ similar overwrites in the release job as well?

@ben-manes
Copy link
Author

ben-manes commented May 1, 2022

This feels like a problem that there should be an idiomatic solution for. What about composite actions or reusable workflows so that users define their own variant of harden-runner configured with their preferred defaults? This would avoid any work on your side except for documenting the pattern.

The source code overwrite is a hack for my project and not idiomatic, so I think you're okay as is. I use a code generator to create classes for every configuration to favor reducing the memory footprint at the cost of a larger dependency (as unloaded classes are free at runtime). For example a cache without any expiration setting doesn't need a per-entry expiration timestamp field. These code generated classes are compiled without debug symbols as simple data holders, which shaves off ~150kb without impacting debuggability. The hack is that the code is compiled once in full and then the subset compiled again without debug, and those classes replaced. A better approach would be to make these independent compilation and combine them later, but that requires teasing them apart and not being so lazy with this ad hoc approach. In that respect your warnings are a good reminder that the build should be improved, as that also busts the build's cache and can cause it to rerun those tasks if invoked later.

@ben-manes
Copy link
Author

work in progress, but this clean up commit should avoid those overwrites. Oddly the compileCodeGenJava task is still being invalidated when used locally, so trying to figure out why it is not cached as expected.

@ben-manes
Copy link
Author

Okay, the build is improved this should be in place. The prior case of a rebuild is because the compiler itself depends on the cache for its AST and the build system incorrectly rewrote that to a project dependency after the jar was built.

However looking at a corrected build and despite the tasks reported as UP-TO-DATE (thus no work), the runner claims the generated files were overwritten. I don't have an explanation for this behavior.

@varunsh-coder
Copy link
Member

Okay, the build is improved this should be in place. The prior case of a rebuild is because the compiler itself depends on the cache for its AST and the build system incorrectly rewrote that to a project dependency after the jar was built.

However looking at a corrected build and despite the tasks reported as UP-TO-DATE (thus no work), the runner claims the generated files were overwritten. I don't have an explanation for this behavior.

I am making updates to correlate the file overwrite event with the step/ line that made the change based on timestamp. Once that is done, it will be visible in the insights website and should help with understanding why this is happening.

@varunsh-coder
Copy link
Member

This feels like a problem that there should be an idiomatic solution for. What about composite actions or reusable workflows so that users define their own variant of harden-runner configured with their preferred defaults? This would avoid any work on your side except for documenting the pattern.

The source code overwrite is a hack for my project and not idiomatic, so I think you're okay as is. I use a code generator to create classes for every configuration to favor reducing the memory footprint at the cost of a larger dependency (as unloaded classes are free at runtime). For example a cache without any expiration setting doesn't need a per-entry expiration timestamp field. These code generated classes are compiled without debug symbols as simple data holders, which shaves off ~150kb without impacting debuggability. The hack is that the code is compiled once in full and then the subset compiled again without debug, and those classes replaced. A better approach would be to make these independent compilation and combine them later, but that requires teasing them apart and not being so lazy with this ad hoc approach. In that respect your warnings are a good reminder that the build should be improved, as that also busts the build's cache and can cause it to rerun those tasks if invoked later.

Composite and reusable workflows are interesting options to solve this. I think composite workflows might create friction for onboarding and developers will likely prefer an easier option. Reusable workflows could work, I need to learn more about that. Will investigate...

@ben-manes
Copy link
Author

I am making updates to correlate the file overwrite event with the step/ line that made the change based on timestamp. Once that is done, it will be visible in the insights website and should help with understanding why this is happening.

That will be very useful, thanks!

A guess is that it runs the codegen task but observes that the output hasn't changed via hashing, and determines it is uptodate so it does not invalidate downstream tasks. Since the task is so inexpensive it would appear otherwise to have not done any work. That wouldn't be the very obvious behavior, but I could see it making sense as a tooling author as an optimization, differentiated by the SKIPPED if irrelevant (but actually means something else, but oh well). The tasks themselves don't indicate anything regarding how to determine if they are uptodate. There might be some task input I need to set to get it to avoid regenerating, if my hypothesis is correct.

@ben-manes
Copy link
Author

stats doesn't indicate any changes when running locally, and a scan does say it has savings but also that it is not cachable, so not sure.

@ben-manes
Copy link
Author

I got it. Fixing the task caching and now the runner does not report any overwritten files. This is a nice set of hygienic build cleanups, so I think it was a good prompt.

@varunsh-coder
Copy link
Member

I got it. Fixing the task caching and now the runner does not report any overwritten files. This is a nice set of hygienic build cleanups, so I think it was a good prompt.

Great! Thanks for the feedback!

@varunsh-coder
Copy link
Member

I am making updates to correlate the file overwrite event with the step/ line that made the change based on timestamp. Once that is done, it will be visible in the insights website and should help with understanding why this is happening.

That will be very useful, thanks!

A guess is that it runs the codegen task but observes that the output hasn't changed via hashing, and determines it is uptodate so it does not invalidate downstream tasks. Since the task is so inexpensive it would appear otherwise to have not done any work. That wouldn't be the very obvious behavior, but I could see it making sense as a tooling author as an optimization, differentiated by the SKIPPED if irrelevant (but actually means something else, but oh well). The tasks themselves don't indicate anything regarding how to determine if they are uptodate. There might be some task input I need to set to get it to avoid regenerating, if my hypothesis is correct.

I have not implemented the UI yet, but based on the logs, the overwrite is in this line.

Mon, 02 May 2022 08:41:38 GMT > Task :guava:compileTestJava UP-TO-DATE

The timestamp of overwrite is logged in Post Harden Runner step here.

[Source code overwritten] file: /home/runner/work/caffeine/caffeine/simulator/build/generated/sources/annotationProcessor/java/main/com/github/benmanes/caffeine/cache/simulator/report/AutoValue_Metrics.java syscall: openat by exe: /opt/hostedtoolcache/Java_Zulu_jdk/17.0.3-7/x64/bin/java [7e3d556664b0ff8f12650d53f957476669df45c99c790e8eb8eade18b55ed596] **Timestamp: 2022-05-02T08:41:42.485Z**

@ben-manes
Copy link
Author

It is actually the next line,

Task :simulator:compileJava

I dug into that last night and it is Google AutoValue breaking incremental compilation. While I’m using a fixed version since they don’t have tests it’s likely a regression.
google/auto#1075

@ben-manes
Copy link
Author

Opened a ticket google/auto#1315

@ben-manes
Copy link
Author

FYI this is fixed too (and my fault, of course). It was a really nice experience setting this up and fixing the issues highlighted. 😊

@varunsh-coder
Copy link
Member

FYI this is fixed too (and my fault, of course). It was a really nice experience setting this up and fixing the issues highlighted. 😊

Thanks for the update! I am curious if the exact step was Task :simulator:compileJava or Task :guava:compileTestJava UP-TO-DATE. I am still not sure why the correlation based on time is not accurate. The file write was reported at Task :guava:compileTestJava UP-TO-DATE.

@ben-manes
Copy link
Author

It was definitely the :simulator:compileJava task. Are you judging by epoch (seconds) resolution? There could easily be a race as the up-to-date check would be instant, so even millis might be too coarse.

@varunsh-coder
Copy link
Member

It was definitely the :simulator:compileJava task. Are you judging by epoch (seconds) resolution? There could easily be a race as the up-to-date check would be instant, so even millis might be too coarse.

As per the downloaded log archive, which has better precision, these are the timestamps:
2022-05-02T08:41:38.6610262Z > Task :guava:compileTestJava UP-TO-DATE
2022-05-02T08:41:53.3611024Z > Task :simulator:compileJava

As per harden-runner log, the first file overwrite was at 2022-05-02T08:41:42.485Z. So as per this info, Task :simulator:compileJava had not started at that time. Or may be workflow logs the Task after task is completed.

@ben-manes
Copy link
Author

oh yes, Gradle displays it after it completes for TERM=dumb mode of a CI. In a rich terminal it is an updated progress line.

@varunsh-coder
Copy link
Member

oh yes, Gradle displays it after it completes for TERM=dumb mode of a CI. In a rich terminal it is an updated progress line.

ok, that explains it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants