Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(fuzzer): Add Semi Filter to Join Fuzzer #11473

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

DanielHunte
Copy link

Summary: This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean columns. The next steps will be to support integer columns.

Differential Revision: D65629460

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 7, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65629460

Copy link

netlify bot commented Nov 7, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit ffeb950
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/673bd7c3cbe97d000871b467

DanielHunte pushed a commit to DanielHunte/velox that referenced this pull request Nov 7, 2024
Summary:

This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean columns. The next steps will be to support integer columns.

Differential Revision: D65629460
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65629460

DanielHunte pushed a commit to DanielHunte/velox that referenced this pull request Nov 7, 2024
Summary:

This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean columns. The next steps will be to support integer columns.

Differential Revision: D65629460
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65629460

DanielHunte pushed a commit to DanielHunte/velox that referenced this pull request Nov 7, 2024
Summary:

This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean columns. The next steps will be to support integer columns.

Differential Revision: D65629460
DanielHunte pushed a commit to DanielHunte/velox that referenced this pull request Nov 7, 2024
Summary:

This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean columns. The next steps will be to support integer columns.

Differential Revision: D65629460
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65629460

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65629460

Copy link
Contributor

@kagamiori kagamiori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @DanielHunte, thank you for adding the support for join filters! I left some comments. By the way, why is it called "semi filter" in the PR title? Are the filters related to semi joins?

Comment on lines 736 to 737
std::string filter =
makeJoinFilter(probeKeys, buildKeys, probeInput, buildInput);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This filter is not used in the query plan?

Comment on lines 409 to 417
if (vectorFuzzer_.coinToss(0.1) && !probeInput.empty() &&
!buildInput.empty()) {
RowTypePtr rowType = vectorFuzzer_.coinToss(0.5)
? asRowType(probeInput[0]->type())
: asRowType(buildInput[0]->type());

for (int i = 0; i < rowType->size(); i++) {
// TODO: Add support for non-boolean types.
if (rowType->childAt(i)->isBoolean() && vectorFuzzer_.coinToss(0.5)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why tossing a coin at line 417 again? Also, it feels like the chance of choosing to generate a filter and rowType happens to contain a boolean column is low. What about we decide whether to generate a join filter and the filter column type in the verify() method first, and if we do, we then make build input or probe input to contain a payload column of the chosen type?

const std::string filter =
withFilter ? makeJoinFilter(probeKeys, buildKeys) : "";
const std::string filter = withFilter
? makeJoinFilter(probeKeys, buildKeys, probeInput, buildInput)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's check out all the methods in this file that create join query plans. We need to test joins with filters not only for nested loop joins.

DanielHunte pushed a commit to DanielHunte/velox that referenced this pull request Nov 13, 2024
Summary:

This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean columns. The next steps will be to support integer columns.

Differential Revision: D65629460
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65629460

DanielHunte pushed a commit to DanielHunte/velox that referenced this pull request Nov 13, 2024
Summary:

This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean columns. The next steps will be to support integer columns.

Differential Revision: D65629460
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65629460

DanielHunte pushed a commit to DanielHunte/velox that referenced this pull request Nov 13, 2024
Summary:

This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean and integer columns.

Differential Revision: D65629460
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65629460

DanielHunte pushed a commit to DanielHunte/velox that referenced this pull request Nov 13, 2024
Summary:

This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean and integer columns.

Differential Revision: D65629460
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65629460

DanielHunte pushed a commit to DanielHunte/velox that referenced this pull request Nov 13, 2024
Summary:

This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean and integer columns.

Differential Revision: D65629460
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65629460

DanielHunte pushed a commit to DanielHunte/velox that referenced this pull request Nov 13, 2024
Summary:

This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean and integer columns.

Differential Revision: D65629460
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65629460

Copy link
Contributor

@kagamiori kagamiori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks good to me overall. I'll take a look at the other diff for ReferenceQueryRunner.

@@ -1014,10 +1034,30 @@ void JoinFuzzer::verify(core::JoinType joinType) {
const auto numKeys = nullAware ? 1 : randInt(1, 5);

// Pick number and types of join keys.
const std::vector<TypePtr> keyTypes = generateJoinKeyTypes(numKeys);
std::vector<TypePtr> keyTypes = generateJoinKeyTypes(numKeys);
std::string semiFilter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to avoid creating a new term, especially because the term "semi" could make people think about semi-join and cause confusion. What about we rename it as "nonKeyFilter" or just "extraFilter"?

DanielHunte pushed a commit to DanielHunte/velox that referenced this pull request Nov 18, 2024
Summary:

This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean and integer columns.

Differential Revision: D65629460
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65629460

DanielHunte pushed a commit to DanielHunte/velox that referenced this pull request Nov 18, 2024
Summary:

This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean and integer columns.

Differential Revision: D65629460
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65629460

DanielHunte pushed a commit to DanielHunte/velox that referenced this pull request Nov 18, 2024
Summary:

This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean and integer columns.

Differential Revision: D65629460
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65629460

@DanielHunte DanielHunte changed the title Add Semi Filter to Join Fuzzer feat(fuzzer): Add Semi Filter to Join Fuzzer Nov 18, 2024
Daniel Hunte added 3 commits November 18, 2024 16:11
…eferenceQueryRunners (facebookincubator#11566)

Summary:

This change updates both the DuckQueryRunner and PrestoQueryRunner to parse filters in their hasJoinNode toSql methods.

Differential Revision: D66021799
…ery Runners (facebookincubator#11576)

Summary:

The select clause is completely missing in the produced query string.

Differential Revision: D66132514
Summary:

This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean and integer columns.

Differential Revision: D65629460
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65629460

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants