-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up sourceUrl
backfill scripts by leveraging indexes
#10039
Conversation
33b757c
to
80f719e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left 2 nits
@@ -44,10 +48,15 @@ async function backfillFolders( | |||
`SELECT id, "internalId", "url" | |||
FROM webcrawler_folders | |||
WHERE id > :lastId | |||
AND "connectorId" = :connectorId -- does not leverage any index, we'll see if too slow or not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep likely not used but still good to have there
lastId = rows[rows.length - 1].id; | ||
} while (rows.length === BATCH_SIZE); | ||
} | ||
) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you could remove
FROM (SELECT unnest(ARRAY [:nodeIds]::text[]) as node_id, | ||
unnest(ARRAY [:urls]::text[]) as url) urls | ||
WHERE data_sources_nodes.node_id = urls.node_id | ||
AND data_sources_nodes.data_source = :dataSourceId;`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty sure sql is smart enough to use the index despite the order being backwards, but only ~83% sure so keep an 👁️ 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧠
…id) and not (node_id, data_source)
Description
data_sources_documents
) and leveraging an index.Risk
n/a
Deploy Plan
no deploy