-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce max span attribute size #4335
base: main
Are you sure you want to change the base?
Conversation
modules/distributor/distributor.go
Outdated
|
||
func spanContainsAttributeTooLarge(span *v1.Span, maxAttrSize int) bool { | ||
for _, attr := range span.Attributes { | ||
if len(attr.Key) > maxAttrSize || attr.Value.Size() > maxAttrSize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, checking the attribute name. I'm wondering if this calculation should be against the total Key.Size() + Value.Size()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if key is max/2 and value is max/2, does it still cause a problem?
modules/distributor/distributor.go
Outdated
|
||
for _, b := range batches { | ||
spansByILS := make(map[uint32]*v1.ScopeSpans) | ||
|
||
for _, ils := range b.ScopeSpans { | ||
for _, span := range ils.Spans { | ||
// drop spans with attributes that are too large | ||
if spanContainsAttributeTooLarge(span, maxSpanAttrSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a way we could drop only the attribute that seems better than dropping the span (preserve as much information as possible). It doesn't fit as easily within our current metrics/logs though.
@@ -229,6 +229,10 @@ distributor: | |||
# instruct the client how to retry. | |||
[retry_after_on_resource_exhausted: <duration> | default = '0' ] | |||
|
|||
# Optional | |||
# Configures the max size a span attribute can be. Any span with at least one attribute over this limit would be discarded with reason "attribute_too_large" | |||
[max_span_attr_size: <int> | default = '10000'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking if this should be max_attr_bytes
to follow the pattern of other limits, and should it also apply to resource/event/link attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding docs :)
benchmark updated but not looking too good :/ |
modules/distributor/distributor.go
Outdated
case *v1_common.AnyValue_StringValue: | ||
if len(value.StringValue) > maxAttrSize { | ||
fmt.Printf("size: %d", len(value.StringValue)) | ||
value.StringValue = value.StringValue[:maxAttrSize] + "_truncated" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is quite expensive as it's going to alloc a completely new string every time.
sane but not as nice option:
just produce a metric that indicates number of truncated attributes
crazy but performant option:
rewrite the byte array underlying the string to contain "_truncated" at the right position
i don't think that would break anything in tempo but not 100% sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the benchmark for the scenario where every attribute needs to be truncated, adding the "_truncated" had a ~400+% degradation. And only 66% if we just truncate without adding the "_truncated". I'm in favor of just reporting the metrics without adding "_truncated". Wdyt?
benchmarks
4 traces, 300 batches and 5000 spans each run
with adding "_truncated"
│ before.out │ after_add_truncated.out │
│ sec/op │ sec/op vs base │
TestsByRequestID 402.3m ± 7% 2384.8m ± 10% +492.83% (p=0.000 n=10)
without adding "_truncated" to truncated attribute values
│ before.out │ after-notruncate.out │
│ sec/op │ sec/op vs base │
TestsByRequestID 357.1m ± 52% 594.0m ± 34% +66.33% (p=0.001 n=10)
│ before.out │ after-notruncate.out │
│ B/op │ B/op vs base │
TestsByRequestID 58.71Mi ± 0% 58.71Mi ± 0% ~ (p=1.000 n=10)
│ before.out │ after-notruncate.out │
│ allocs/op │ allocs/op vs base │
TestsByRequestID 5.100k ± 0% 5.100k ± 0% ~ (p=1.000 n=10) ¹
What this PR does: Add config for
max_span_attr_size
to enforce the max size a span attribute can be. If a span contains an attribute with either key or value over the limit is sent to Tempo, the attribute key/value will be truncated and marked with "_truncated".Which issue(s) this PR fixes:
Fixes #3985
benchmarks
4 traces, 300 batches and 5000 spans each run
with adding "_truncated"
without adding "_truncated" to truncated attribute values
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]