Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the req_id attribute on the decision logs #5196

Merged
merged 4 commits into from
Oct 14, 2022

Conversation

humbertoc-silva
Copy link
Contributor

Today it is not possible to correlate the decision log with other types of logs (server, print, etc.) when the server log level is >= INFO. The log correlation could be helpful in troubleshooting.

A solution is to add a common attribute in all logs to make the log correlation possible, so adding the req_id attribute on decision logs, when server log level is >= INFO, will make it possible.

Fixes: #5006

Signed-off-by: Humberto Corrêa da Silva humbertoc_silva@hotmail.com

Copy link
Member

@anderseknert anderseknert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me, but I'll defer to @srenatus for the architectural review. One thing I don't understand though — where/when is the actual request ID created? I only see it being referenced. Has that been done elsewhere? 😅

var ok bool

if fieldvalue, ok = fields["client_addr"]; !ok {
t.Fatal("Fields did not contain the client_addr field")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use Error rather than Fatal for these assertions, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used Fatal with the intention of following the structure of other tests, but I agree we can use Error and the test case run entirely.

@anderseknert
Copy link
Member

Oh, I just realized — there is no documentation for this feature included in this PR. Since this is something new that will show up in logs, and decision logs, we'll need to document what the purpose of the field is.

@humbertoc-silva
Copy link
Contributor Author

@anderseknert I will add the documentation in the next commit and change Fatal to Error too.

The request ID is generated here: https://github.com/humbertoc-silva/opa/blob/decision-log-req-id/runtime/logging.go#L58

@anderseknert
Copy link
Member

I see! Using an integer and auto-increment seems like it would risk making this less useful, or no? If you have 100 OPAs in a cluster, they are all going to have a "request_id": 0, 1, 2, 3, 4, 5, 6 ... etc, in their logs, and the counter is of course reset whenever an instance is restarted, so after a while you are going to see thousands of "request_id": 0 in your log aggregator... and if the purpose of this was to help correlate a request with other identifiers, would this help much with that? I know that wasn't part of your PR, just asking so it doesn't turn out to be unusable for what you intended it for 😅

@humbertoc-silva
Copy link
Contributor Author

humbertoc-silva commented Sep 29, 2022

@anderseknert maybe not. Using the example of 100 OPA instances:

Instance 1 - Instance ID: 123
req_id: 1 (request, response, print, decision_log)
req_id: 2 (request, response, print, decision_log)
and so on...

Instance 2 - Instance ID: 456
req_id: 1 (request, response, print, decision_log)
req_id: 2 (request, response, print, decision_log)
and so on...

Today we can use some information related to the instance, the ID for example, to aggregate the logs by instance. If an instance reset I will consider the new instance ID and the request ID counting start again, as expected.

To avoid this type of problem, the OPA request ID could be a unique value (GUID), so we wouldn't depend on extra values to aggregate the information. Does this make sense to you?

I did not get this part:

so after a while you are going to see thousands of "request_id": 0 in your log aggregator...

When the request ID would equal to 0.

@humbertoc-silva humbertoc-silva force-pushed the decision-log-req-id branch 3 times, most recently from f5bf30f to 43c4c3b Compare October 5, 2022 13:12
@srenatus
Copy link
Contributor

srenatus commented Oct 5, 2022

Sorry I haven't gotten to this yet. I'll make sure to review this a bit more this week.

@netlify
Copy link

netlify bot commented Oct 5, 2022

Deploy Preview for openpolicyagent ready!

Name Link
🔨 Latest commit 9b6a4e4
🔍 Latest deploy log https://app.netlify.com/sites/openpolicyagent/deploys/634929fcb9d0a9000970cb3e
😎 Deploy Preview https://deploy-preview-5196--openpolicyagent.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@humbertoc-silva
Copy link
Contributor Author

humbertoc-silva commented Oct 5, 2022

I was thinking that would be interesting if we had a new field on logs, e.g correlation_id, dedicated for this purpose or continue to use req_id. The value could be a unique value (GUID), by default OPA could generate a value but this could be configurable too.

Example:

  • Default: OPA will generate a unique value on each request.
  • Via configuration: OPA will read a specific header (e.g X-Correlation-ID, X-Request-ID, etc.) and use it on all logs so that we could correlate external logs too.

Could this be a future improvement for this PR?

@humbertoc-silva humbertoc-silva force-pushed the decision-log-req-id branch 3 times, most recently from acc9178 to 66e91e0 Compare October 11, 2022 00:11
@srenatus
Copy link
Contributor

I was thinking that would be interesting if we had a new field on logs, e.g correlation_id, dedicated for this purpose or continue to use req_id. The value could be a unique value (GUID), by default OPA could generate a value but this could be configurable too.

Example:

  • Default: OPA will generate a unique value on each request.
  • Via configuration: OPA will read a specific header (e.g X-Correlation-ID, X-Request-ID, etc.) and use it on all logs so that we could correlate external logs too.

Could this be a future improvement for this PR?

There have been related issues, I think (#5008); I think what you propose there does make sense. I'm not certain about the potential overlap though -- there's a lot going on on OpenTelemetry, too, with span and trace id #5230... Let's get that discussion going in one of those issues, or a new one.

Copy link
Contributor

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took me so long to review this -- please have a look at the inline comment 👇

@@ -76,6 +76,7 @@ Decision log updates contain the following fields:
| `[_].erased` | `array[string]` | Set of JSON Pointers specifying fields in the event that were erased. |
| `[_].masked` | `array[string]` | Set of JSON Pointers specifying fields in the event that were masked. |
| `[_].nd_builtin_cache` | `object` | Key-value pairs of non-deterministic builtin names, paired with objects specifying the input/output mappings for each unique invocation of that builtin during policy evaluation. Intended for use in debugging and decision replay. Receivers will need to decode the JSON using Rego's JSON decoders. |
| `[_].req_id` | `number` | Unique request identifier for the request that started the policy query. The attribute value is the same as the value present in others logs (request, response, and print) and could be used to correlate them all. This attribute will be included just when OPA runtime is initialized in server mode and the log level is equal or greater than info. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] "Unique" -- We should mention that it's incrementing, and unique only to the OPA instance. Not globally unique or some such thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

type requestContextKey string

// ReqCtxKey is the key used to access the request context.
const ReqCtxKey = requestContextKey("request-context-key")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exporting this is defeating the purpose here: It's done via non-exported type to ensure that we control how it's set and read.

Alternatively, we could keep reqCtxKey unexported and expose a method like

func WithRequestID(ctx context.Context, reqID string) context.Context

that injects the request ID accordingly, and

func RequestIDFromContext(context.Context) string

or something like that for retrieval.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@humbertoc-silva
Copy link
Contributor Author

I was thinking that would be interesting if we had a new field on logs, e.g correlation_id, dedicated for this purpose or continue to use req_id. The value could be a unique value (GUID), by default OPA could generate a value but this could be configurable too.
Example:

  • Default: OPA will generate a unique value on each request.
  • Via configuration: OPA will read a specific header (e.g X-Correlation-ID, X-Request-ID, etc.) and use it on all logs so that we could correlate external logs too.

Could this be a future improvement for this PR?

There have been related issues, I think (#5008); I think what you propose there does make sense. I'm not certain about the potential overlap though -- there's a lot going on on OpenTelemetry, too, with span and trace id #5230... Let's get that discussion going in one of those issues, or a new one.

Ok, @srenatus
I need to start some tests with OPA and OpenTelemetry to see how things work to contribute to discussions.

Today it is not possible to correlate the decision log with
other types of logs (server, print, etc.) when the server log
level is >= INFO. The log correlation could be helpful in
troubleshooting.

A solution is to add a common attribute in all logs to make
the log correlation possible, so adding the req_id attribute
on decision logs, when server log level is >= INFO, will make it
possible.

Fixes: open-policy-agent#5006

Signed-off-by: Humberto Corrêa da Silva <humbertoc_silva@hotmail.com>
The documentation purpose is to explain the relation with
others logs, how it could be used, and when it is included
on decision logs.

Fixes: open-policy-agent#5006

Signed-off-by: Humberto Corrêa da Silva <humbertoc_silva@hotmail.com>
Explain that request ID is incremental and unique by OPA instance.

Define RequestContext key as an unexported const to avoid colision and
provide RequestContext type-safe accessors.
More info: https://pkg.go.dev/context#Context.Value

Fixes: open-policy-agent#5006

Signed-off-by: Humberto Corrêa da Silva <humbertoc_silva@hotmail.com>
Copy link
Contributor

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. WDYT @anderseknert?

Copy link
Member

@anderseknert anderseknert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@srenatus srenatus merged commit f1761ac into open-policy-agent:main Oct 14, 2022
@humbertoc-silva humbertoc-silva deleted the decision-log-req-id branch October 17, 2022 17:04
byronic pushed a commit to byronic/opa that referenced this pull request Oct 17, 2022
…pen-policy-agent#5196)

Today it is not possible to correlate the decision log with
other types of logs (server, print, etc.) when the server log
level is >= INFO. The log correlation could be helpful in
troubleshooting.

A solution is to add a common attribute in all logs to make
the log correlation possible, so adding the req_id attribute
on decision logs, when server log level is >= INFO, will make it
possible.

Fixes: open-policy-agent#5006

* Add documentation about decision log req_id attribute

The documentation purpose is to explain the relation with
others logs, how it could be used, and when it is included
on decision logs.

Signed-off-by: Humberto Corrêa da Silva <humbertoc_silva@hotmail.com>
Signed-off-by: Byron Lagrone <byron.lagrone@seqster.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the req_id attribute on the decision logs
3 participants