wip: Add EndpointTraceItemStats RPC #6809

shruthilayaj · 2025-01-23T16:22:03Z

For comparison workflows, we need the count distribution data of all attribute/value pairs (both string and numeric, however we'll focus on string first). We're introducing a trace item stats endpoint to facilitate that. The aim is to build a somewhat generic endpoint that can give a high level overview of the shape of your data.

This implementation is fairly simple, restricted to string attributes to begin with. It returns the frequency distribution of attributes in the requested filter. The data returned is ordered by keys that have the highest frequency attribute values.

shruthilayaj · 2025-01-23T16:50:22Z

snuba/web/rpc/v1/endpoint_trace_item_stats.py

+    concat_attr_maps = FunctionCall(
+        alias="attr_str_concat",
+        function_name="mapConcat",
+        parameters=tuple(column(f"attr_str_{i}") for i in range(ATTRIBUTE_BUCKETS)),
+    )
+    attrs_string_keys = tupleElement(
+        "attr_key", arrayJoin("attr_str", concat_attr_maps), Literal(None, 1)
+    )
+    attrs_string_values = tupleElement(
+        "attr_value",
+        arrayJoin("attr_str", concat_attr_maps),
+        Literal(None, 2),
+    )
+
+    selected_columns = [
+        SelectedExpression(
+            name="attr_key",
+            expression=attrs_string_keys,
+        ),
+        SelectedExpression(
+            name="attr_value",
+            expression=attrs_string_values,
+        ),
+        SelectedExpression(
+            name=aggregation.label,
+            expression=aggregation_to_expression(aggregation),
+        ),
+    ]


SELECT (arrayJoin(mapConcat(attr_str_0, attr_str_1, attr_str_2, attr_str_3, attr_str_4, attr_str_5, attr_str_6, attr_str_7, attr_str_8, attr_str_9, attr_str_10, attr_str_11, attr_str_12, attr_str_13, attr_str_14, attr_str_15, attr_str_16, attr_str_17, attr_str_18, attr_str_19) AS attr_str_concat) AS attr_str).1 AS attr_keys, attr_str.2 AS attr_values, sumIf(sign * sampling_weight, ((duration_micro / 1000) AS `sentry.duration_ms`) IS NOT NULL) AS count FROM eap_spans_2_local WHERE (project_id IN [1, 2, 3]) AND (organization_id = 1) AND less(_sort_timestamp, toDateTime(1737500400)) AND greaterOrEquals(_sort_timestamp, toDateTime(1737489600)) AND true GROUP BY attr_keys, attr_values ORDER BY count DESC LIMIT 10 BY attr_keys LIMIT 0, 100

the resulting query looks like this ^
https://www.notion.so/sentry/Naive-approach-for-stats-endpoint-1848b10e4b5d80708bcffa341e0f225a
ran this query locally and on snuba admin, documented results - snuba admin is being a little buggy so couldn't do something more comprehensive

shruthilayaj added 2 commits January 23, 2025 11:16

feat: Add EndpointTraceItemStats RPC

1a8acfd

fix types

d9d4a4a

shruthilayaj commented Jan 23, 2025

View reviewed changes

shruthilayaj added 3 commits January 23, 2025 15:22

comment

fdd56dd

update comment

583a15a

typo

df9f2e0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip: Add EndpointTraceItemStats RPC #6809

wip: Add EndpointTraceItemStats RPC #6809

shruthilayaj commented Jan 23, 2025 •

edited

Loading

shruthilayaj Jan 23, 2025

wip: Add EndpointTraceItemStats RPC #6809

Are you sure you want to change the base?

wip: Add EndpointTraceItemStats RPC #6809

Conversation

shruthilayaj commented Jan 23, 2025 • edited Loading

shruthilayaj Jan 23, 2025

Choose a reason for hiding this comment

shruthilayaj commented Jan 23, 2025 •

edited

Loading