Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: Add EndpointTraceItemStats RPC #6809

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

shruthilayaj
Copy link
Member

@shruthilayaj shruthilayaj commented Jan 23, 2025

requires getsentry/sentry-protos#101

For comparison workflows, we need the count distribution data of all attribute/value pairs (both string and numeric, however we'll focus on string first). We're introducing a trace item stats endpoint to facilitate that. The aim is to build a somewhat generic endpoint that can give a high level overview of the shape of your data.

This implementation is fairly simple, restricted to string attributes to begin with. It returns the frequency distribution of attributes in the requested filter. The data returned is ordered by keys that have the highest frequency attribute values.

Comment on lines +108 to +135
concat_attr_maps = FunctionCall(
alias="attr_str_concat",
function_name="mapConcat",
parameters=tuple(column(f"attr_str_{i}") for i in range(ATTRIBUTE_BUCKETS)),
)
attrs_string_keys = tupleElement(
"attr_key", arrayJoin("attr_str", concat_attr_maps), Literal(None, 1)
)
attrs_string_values = tupleElement(
"attr_value",
arrayJoin("attr_str", concat_attr_maps),
Literal(None, 2),
)

selected_columns = [
SelectedExpression(
name="attr_key",
expression=attrs_string_keys,
),
SelectedExpression(
name="attr_value",
expression=attrs_string_values,
),
SelectedExpression(
name=aggregation.label,
expression=aggregation_to_expression(aggregation),
),
]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SELECT
    (arrayJoin(mapConcat(attr_str_0, attr_str_1, attr_str_2, attr_str_3, attr_str_4, attr_str_5, attr_str_6, attr_str_7, attr_str_8, attr_str_9, attr_str_10, attr_str_11, attr_str_12, attr_str_13, attr_str_14, attr_str_15, attr_str_16, attr_str_17, attr_str_18, attr_str_19) AS attr_str_concat) AS attr_str).1 AS attr_keys,
    attr_str.2 AS attr_values,
    sumIf(sign * sampling_weight, ((duration_micro / 1000) AS `sentry.duration_ms`) IS NOT NULL) AS count
FROM eap_spans_2_local
WHERE (project_id IN [1, 2, 3]) AND (organization_id = 1) AND less(_sort_timestamp, toDateTime(1737500400)) AND greaterOrEquals(_sort_timestamp, toDateTime(1737489600)) AND true
GROUP BY
    attr_keys,
    attr_values
ORDER BY count DESC
LIMIT 10 BY attr_keys
LIMIT 0, 100

the resulting query looks like this ^
https://www.notion.so/sentry/Naive-approach-for-stats-endpoint-1848b10e4b5d80708bcffa341e0f225a
ran this query locally and on snuba admin, documented results - snuba admin is being a little buggy so couldn't do something more comprehensive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant