-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the req_id attribute on the decision logs #5196
Add the req_id attribute on the decision logs #5196
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me, but I'll defer to @srenatus for the architectural review. One thing I don't understand though — where/when is the actual request ID created? I only see it being referenced. Has that been done elsewhere? 😅
var ok bool | ||
|
||
if fieldvalue, ok = fields["client_addr"]; !ok { | ||
t.Fatal("Fields did not contain the client_addr field") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use Error rather than Fatal for these assertions, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used Fatal
with the intention of following the structure of other tests, but I agree we can use Error
and the test case run entirely.
Oh, I just realized — there is no documentation for this feature included in this PR. Since this is something new that will show up in logs, and decision logs, we'll need to document what the purpose of the field is. |
@anderseknert I will add the documentation in the next commit and change The request ID is generated here: https://github.com/humbertoc-silva/opa/blob/decision-log-req-id/runtime/logging.go#L58 |
c3daaca
to
22680ea
Compare
I see! Using an integer and auto-increment seems like it would risk making this less useful, or no? If you have 100 OPAs in a cluster, they are all going to have a "request_id": 0, 1, 2, 3, 4, 5, 6 ... etc, in their logs, and the counter is of course reset whenever an instance is restarted, so after a while you are going to see thousands of "request_id": 0 in your log aggregator... and if the purpose of this was to help correlate a request with other identifiers, would this help much with that? I know that wasn't part of your PR, just asking so it doesn't turn out to be unusable for what you intended it for 😅 |
@anderseknert maybe not. Using the example of 100 OPA instances: Instance 1 - Instance ID: 123 Instance 2 - Instance ID: 456 Today we can use some information related to the instance, the ID for example, to aggregate the logs by instance. If an instance reset I will consider the new instance ID and the request ID counting start again, as expected. To avoid this type of problem, the OPA request ID could be a unique value (GUID), so we wouldn't depend on extra values to aggregate the information. Does this make sense to you? I did not get this part:
When the request ID would equal to 0. |
f5bf30f
to
43c4c3b
Compare
Sorry I haven't gotten to this yet. I'll make sure to review this a bit more this week. |
43c4c3b
to
b9b5757
Compare
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
I was thinking that would be interesting if we had a new field on logs, e.g Example:
Could this be a future improvement for this PR? |
acc9178
to
66e91e0
Compare
There have been related issues, I think (#5008); I think what you propose there does make sense. I'm not certain about the potential overlap though -- there's a lot going on on OpenTelemetry, too, with span and trace id #5230... Let's get that discussion going in one of those issues, or a new one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it took me so long to review this -- please have a look at the inline comment 👇
@@ -76,6 +76,7 @@ Decision log updates contain the following fields: | |||
| `[_].erased` | `array[string]` | Set of JSON Pointers specifying fields in the event that were erased. | | |||
| `[_].masked` | `array[string]` | Set of JSON Pointers specifying fields in the event that were masked. | | |||
| `[_].nd_builtin_cache` | `object` | Key-value pairs of non-deterministic builtin names, paired with objects specifying the input/output mappings for each unique invocation of that builtin during policy evaluation. Intended for use in debugging and decision replay. Receivers will need to decode the JSON using Rego's JSON decoders. | | |||
| `[_].req_id` | `number` | Unique request identifier for the request that started the policy query. The attribute value is the same as the value present in others logs (request, response, and print) and could be used to correlate them all. This attribute will be included just when OPA runtime is initialized in server mode and the log level is equal or greater than info. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] "Unique" -- We should mention that it's incrementing, and unique only to the OPA instance. Not globally unique or some such thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
logging/logging.go
Outdated
type requestContextKey string | ||
|
||
// ReqCtxKey is the key used to access the request context. | ||
const ReqCtxKey = requestContextKey("request-context-key") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exporting this is defeating the purpose here: It's done via non-exported type to ensure that we control how it's set and read.
Alternatively, we could keep reqCtxKey
unexported and expose a method like
func WithRequestID(ctx context.Context, reqID string) context.Context
that injects the request ID accordingly, and
func RequestIDFromContext(context.Context) string
or something like that for retrieval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Ok, @srenatus |
Today it is not possible to correlate the decision log with other types of logs (server, print, etc.) when the server log level is >= INFO. The log correlation could be helpful in troubleshooting. A solution is to add a common attribute in all logs to make the log correlation possible, so adding the req_id attribute on decision logs, when server log level is >= INFO, will make it possible. Fixes: open-policy-agent#5006 Signed-off-by: Humberto Corrêa da Silva <humbertoc_silva@hotmail.com>
The documentation purpose is to explain the relation with others logs, how it could be used, and when it is included on decision logs. Fixes: open-policy-agent#5006 Signed-off-by: Humberto Corrêa da Silva <humbertoc_silva@hotmail.com>
Explain that request ID is incremental and unique by OPA instance. Define RequestContext key as an unexported const to avoid colision and provide RequestContext type-safe accessors. More info: https://pkg.go.dev/context#Context.Value Fixes: open-policy-agent#5006 Signed-off-by: Humberto Corrêa da Silva <humbertoc_silva@hotmail.com>
66e91e0
to
02afff9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. WDYT @anderseknert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
…pen-policy-agent#5196) Today it is not possible to correlate the decision log with other types of logs (server, print, etc.) when the server log level is >= INFO. The log correlation could be helpful in troubleshooting. A solution is to add a common attribute in all logs to make the log correlation possible, so adding the req_id attribute on decision logs, when server log level is >= INFO, will make it possible. Fixes: open-policy-agent#5006 * Add documentation about decision log req_id attribute The documentation purpose is to explain the relation with others logs, how it could be used, and when it is included on decision logs. Signed-off-by: Humberto Corrêa da Silva <humbertoc_silva@hotmail.com> Signed-off-by: Byron Lagrone <byron.lagrone@seqster.com>
Today it is not possible to correlate the decision log with other types of logs (server, print, etc.) when the server log level is >= INFO. The log correlation could be helpful in troubleshooting.
A solution is to add a common attribute in all logs to make the log correlation possible, so adding the req_id attribute on decision logs, when server log level is >= INFO, will make it possible.
Fixes: #5006
Signed-off-by: Humberto Corrêa da Silva humbertoc_silva@hotmail.com