Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update(engine)!: print stats to stderr #3333

Closed

Conversation

LucaGuerra
Copy link
Contributor

@LucaGuerra LucaGuerra commented Sep 18, 2024

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind cleanup

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area engine

What this PR does / why we need it:

We have noticed a non-deterministic testing failure:

{"deadline":180000000000,"level":"info","msg":"running falco with runner","time":"2024-09-18T10:58:52Z"}
{"cmd":"/usr/bin/falco -c stdout_output.yaml -o json_output=true -r single_rule_with_tags.yaml -o engine.kind=replay -o engine.replay.capture_file=cat_write.scap -o time_format_iso_8601=true -o json_include_output_property=true -o json_include_tags_property=true -o log_level=debug -o log_stderr=true -o log_syslog=false -o stdout_output.enabled=true","level":"debug","msg":"executing command","time":"2024-09-18T10:58:52Z"}
{"error":"error code 1","level":"warning","msg":"error running falco with runner","time":"2024-09-18T10:58:52Z"}
    legacy_test.go:156: 
        	Error Trace:	/home/runner/actions-runner/_work/_actions/falcosecurity/testing/main/legacy_test.go:156
        	Error:      	Input ('Events detected: 8') needs to be valid json.
        	            	JSON parsing error: 'invalid character 'E' looking for beginning of value'
        	Test:       	TestFalco_Legacy_StdoutOutputJsonStrict

The problem here is that when you shut down Falco (or when the end of output is reached) there is a stat message that is written to stdout and is not synchronized with the Falco output queue. Example:

Events detected: 8
Rule counts by severity:
   WARNING: 8
Triggered rules by rule name:
   open_from_cat: 8

This means that this output can be intermixed with the regular Falco output. If you were expecting json you will find something else. There are more messages that are printed but those are on stderr. I have discussed this with @jasondellaluce , and we agree that it makes sense to put all of that to stderr.

This is however a breaking change in some ways, so if someone (e.g. @leogr ) can think of a reason why we shouldn't do it maybe we need to better synchronize this, although it is not as trivial.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

update(engine)!: print stats to stderr

Signed-off-by: Luca Guerra <luca@guerra.sh>
Co-authored-by: Jason Dellaluce <jasondellaluce@gmail.com>
@poiana
Copy link
Contributor

poiana commented Sep 18, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LucaGuerra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

This PR may bring feature or behavior changes in the Falco engine and may require the engine version to be bumped.

Please double check userspace/engine/falco_engine_version.h file. See versioning for FALCO_ENGINE_VERSION.

/hold

@LucaGuerra
Copy link
Contributor Author

Not a blocker for release

/milestone 0.40.0

@poiana poiana added this to the 0.40.0 milestone Sep 18, 2024
@leogr
Copy link
Member

leogr commented Sep 18, 2024

First of all, really good catch!

That said, I would be okay with this solution even though I don't believe printing these stats to stderr makes too much sense.

Moreover, it seems that s.engine->print_stats(); just waits for all running event source threads and not for the output queue (the output worker is stopped even later when the falco_output class is destroyed). Before going with this solution, I'd evaluate if there may be some other hidden synchronization problem (and - perhaps - a low-hanging fruit to fix this with correct synchronization)

Okay, I got it. The actual problem is that the output worker is closed after all event source threads. If this is the case, wouldn't it be enough to destroy the outputs object (this will force the current thread to wait) before calling s.engine->print_stats();? This would also improve the UX since the state will always be printed after. 🤔

@LucaGuerra
Copy link
Contributor Author

@leogr ok, makes sense. We merged the suggested change in #3338 . Closing this for now.

@LucaGuerra LucaGuerra closed this Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants