Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1887694 - Generate events_stream SQL for events #2196

Merged
merged 3 commits into from
Apr 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 45 additions & 16 deletions src/data/gleanSql.js
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
export const getGleanQuery = (columnName, table) => `
-- Auto-generated by the Glean Dictionary.
-- https://docs.telemetry.mozilla.org/cookbooks/accessing_glean_data.html#accessing-glean-data-in-bigquery

SELECT
${columnName}
FROM
Expand All @@ -13,9 +13,9 @@ WHERE
-- IMPORTANT: Remove the limit clause when the query is ready.
LIMIT 10`;

export const getGleanEventQuery = (table, additionalInfo) => `
-- Auto-generated by the Glean Dictionary.
-- https://docs.telemetry.mozilla.org/cookbooks/accessing_glean_data.html#event-metrics \n
export const getGleanLegacyEventQuery = (table, additionalInfo) => `
-- Auto-generated by the Glean Dictionary.
-- https://docs.telemetry.mozilla.org/cookbooks/accessing_glean_data.html#event-metrics

WITH events AS (
SELECT
Expand All @@ -39,18 +39,47 @@ SELECT * FROM events
-- IMPORTANT: Remove the limit clause when the query is ready.
LIMIT 10`;

export const getGleanPingQuery = (table) => `
-- Autogenerated by the Glean Dictionary.
-- https://mozilla.github.io/glean/book/user/pings/index.html#payload-structure
export const getGleanEventQuery = (table, additionalInfo) => `
-- Auto-generated by the Glean Dictionary.
-- https://docs.telemetry.mozilla.org/cookbooks/accessing_glean_data.html#event-metrics

WITH events AS (
SELECT
client_info.*,
ping_info.*,
metrics.*
FROM
${table} AS m
-- The table is partitioned by submission timestamp, so we need to
-- include it in the query. We use event_timestamp instead to understand
-- when the event happened exactly at the source.
submission_timestamp,
Dexterp37 marked this conversation as resolved.
Show resolved Hide resolved
event_timestamp,
event,
event_extra
FROM ${table} as e
WHERE
-- Pick yesterday's data from stable/historical tables.
-- Pick data for the last two days from stable/historical tables.
-- https://docs.telemetry.mozilla.org/cookbooks/bigquery/querying.html#table-layout-and-naming
DATE(submission_timestamp) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
-- IMPORTANT: Remove the limit clause when the query is ready.
LIMIT 10`;
date(submission_timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)
AND event = '${additionalInfo.category}.${additionalInfo.name}'
-- Event extras can be queried as a JSON object. Here is an example
-- to only fetch events that carry an extra property named 'the_name'
-- with a value of 'the_value'.
-- AND JSON_VALUE(event_extra.the_name) = 'the_value'
)
SELECT event_timestamp, event, event_extra FROM events
ORDER BY event_timestamp DESC
-- IMPORTANT: Remove the limit clause when the query is ready.
LIMIT 10`;

export const getGleanPingQuery = (table) => `
-- Autogenerated by the Glean Dictionary.
-- https://mozilla.github.io/glean/book/user/pings/index.html#payload-structure
SELECT
client_info.*,
ping_info.*,
metrics.*
FROM
${table} AS m
WHERE
-- Pick yesterday's data from stable/historical tables.
-- https://docs.telemetry.mozilla.org/cookbooks/bigquery/querying.html#table-layout-and-naming
DATE(submission_timestamp) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
-- IMPORTANT: Remove the limit clause when the query is ready.
LIMIT 10`;
9 changes: 8 additions & 1 deletion src/pages/MetricDetail.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,14 @@
additionalInfo
) {
if (metricType === "event") {
return getGleanEventQuery(table, additionalInfo);
// Although events might come from other pings, we override that and
// generate SQL just for the `events_stream`. This changes
// `some_database.table` to `some_database.events_stream`.
Comment on lines +57 to +59
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure if we want to retain the old SQL code generation. Does it make any sense? This is quite a hack @badboy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For events that are not in an events ping this will be incorrect with this change, because they don't end up in the stream just yet.

But otherwise it's fine; no point in keeping the legacy way around

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, what about this? We retain the old query for non 'events' pings until this gets fixed.

const tableNameParts = table.split('.');
const override = `${tableNameParts[0]}.events_stream`;
return tableNameParts[1] === "events"
? getGleanEventQuery(override, additionalInfo)
: getGleanLegacyEventQuery(table, additionalInfo);
}

return getGleanQuery(columnName, table);
Expand Down