-
-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(eap): Start decoupling EAP entities at the entity layer #6701
base: master
Are you sure you want to change the base?
Conversation
693b230
to
cc41bc3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also shoul
snuba/datasets/configuration/events_analytics_platform/entities/eap_spans_rpc.yaml
Show resolved
Hide resolved
snuba/datasets/configuration/events_analytics_platform/entities/eap_spans_rpc.yaml
Show resolved
Hide resolved
snuba/datasets/configuration/events_analytics_platform/entities/eap_spans_rpc.yaml
Outdated
Show resolved
Hide resolved
[ | ||
{ name: service, type: String }, | ||
{ name: trace_id, type: UUID }, | ||
{ name: span_id, type: UInt, args: { size: 64 } }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have some sort of a event_id
, as a string, required in for the RPC. Having it as a string would let us pass any sort of UUIDs or span IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about uint128? String would have a big performance hit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add that later, would require another net-new processor
snuba/datasets/configuration/events_analytics_platform/entities/eap_spans_rpc.yaml
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. I left a few comments.
snuba/datasets/configuration/events_analytics_platform/entities/eap_spans_rpc.yaml
Outdated
Show resolved
Hide resolved
2640a01
to
45e6547
Compare
@@ -229,41 +227,6 @@ def attempt_map( | |||
return None | |||
|
|||
|
|||
@dataclass(frozen=True) | |||
class SubscriptableHashBucketMapper(SubscriptableReferenceMapper): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets moved into a query processor (the same one which handles mapKeys and mapContains)
@@ -88,11 +74,16 @@ query_processors: | |||
curried_aggregation_names: | |||
- quantile | |||
- quantileTDigestWeighted | |||
- processor: HashBucketFunctionTransformer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a really annoying order of operations, where query processors needed to know what bucket things would end up in, but that was done at the storage level.
By merging all of the processors which need to know the actual bucket-level information to a single one at the end, the pipeline is a lot more understandable and has less chance for bugs
storage_selector: | ||
selector: DefaultQueryStorageSelector | ||
|
||
query_processors: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote a lot of this, there were starting to be conflicting edge cases where, e.g., sum(attr_f64[sentry.duration_ms])
should become sum(duration_ms)
, but the very similar
sum(attr_i64[blah])
should become
sumIf(CAST(attr_num_2[blah], 'Integer'), mapContains(attr_num_2, 'blah'))
Someone one Pierre's team will be finishing this PR |
There's currently a big chunk of code in
common.py
that maps column accesses to the correct place.However, it's all hard-coded to the spans table right now.
We would like to add new non-span entity types to the EAP RPCs, and the easiest place to do that is at the entity layer.
This essentially hides the 'real columns' behind a well-known set of columns (
organization_id
,attr_str
,attr_f64
,attr_i64
, ...) which will be shared across all EAP entities.When we add new entities, we can specify in the entity YAML what maps where in a way that the RPC doesn't have to know what type of entity is being processed.
Works towards solving https://github.com/getsentry/eap-planning/issues/126