Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(eap): Start decoupling EAP entities at the entity layer #6701

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

colin-sentry
Copy link
Member

@colin-sentry colin-sentry commented Dec 27, 2024

There's currently a big chunk of code in common.py that maps column accesses to the correct place.

However, it's all hard-coded to the spans table right now.

We would like to add new non-span entity types to the EAP RPCs, and the easiest place to do that is at the entity layer.

This essentially hides the 'real columns' behind a well-known set of columns (organization_id, attr_str, attr_f64, attr_i64, ...) which will be shared across all EAP entities.

When we add new entities, we can specify in the entity YAML what maps where in a way that the RPC doesn't have to know what type of entity is being processed.

Works towards solving https://github.com/getsentry/eap-planning/issues/126

Copy link
Contributor

@phacops phacops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also shoul

[
{ name: service, type: String },
{ name: trace_id, type: UUID },
{ name: span_id, type: UInt, args: { size: 64 } },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have some sort of a event_id, as a string, required in for the RPC. Having it as a string would let us pass any sort of UUIDs or span IDs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about uint128? String would have a big performance hit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add that later, would require another net-new processor

@phacops phacops requested a review from a team January 3, 2025 21:53
Copy link
Member

@onkar onkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I left a few comments.

@@ -229,41 +227,6 @@ def attempt_map(
return None


@dataclass(frozen=True)
class SubscriptableHashBucketMapper(SubscriptableReferenceMapper):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets moved into a query processor (the same one which handles mapKeys and mapContains)

@@ -88,11 +74,16 @@ query_processors:
curried_aggregation_names:
- quantile
- quantileTDigestWeighted
- processor: HashBucketFunctionTransformer
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a really annoying order of operations, where query processors needed to know what bucket things would end up in, but that was done at the storage level.

By merging all of the processors which need to know the actual bucket-level information to a single one at the end, the pipeline is a lot more understandable and has less chance for bugs

storage_selector:
selector: DefaultQueryStorageSelector

query_processors:
Copy link
Member Author

@colin-sentry colin-sentry Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote a lot of this, there were starting to be conflicting edge cases where, e.g., sum(attr_f64[sentry.duration_ms]) should become sum(duration_ms), but the very similar

sum(attr_i64[blah]) should become

sumIf(CAST(attr_num_2[blah], 'Integer'), mapContains(attr_num_2, 'blah'))

@colin-sentry
Copy link
Member Author

colin-sentry commented Jan 8, 2025

Someone one Pierre's team will be finishing this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants