-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(fuzzer): Add Semi Filter to Join Fuzzer #11473
base: main
Are you sure you want to change the base?
Conversation
This pull request was exported from Phabricator. Differential Revision: D65629460 |
✅ Deploy Preview for meta-velox canceled.
|
Summary: This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean columns. The next steps will be to support integer columns. Differential Revision: D65629460
4524fb2
to
a0ed8ee
Compare
This pull request was exported from Phabricator. Differential Revision: D65629460 |
Summary: This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean columns. The next steps will be to support integer columns. Differential Revision: D65629460
a0ed8ee
to
aebf152
Compare
This pull request was exported from Phabricator. Differential Revision: D65629460 |
Summary: This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean columns. The next steps will be to support integer columns. Differential Revision: D65629460
aebf152
to
e1dd45b
Compare
Summary: This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean columns. The next steps will be to support integer columns. Differential Revision: D65629460
e1dd45b
to
5b5cce0
Compare
This pull request was exported from Phabricator. Differential Revision: D65629460 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D65629460 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @DanielHunte, thank you for adding the support for join filters! I left some comments. By the way, why is it called "semi filter" in the PR title? Are the filters related to semi joins?
velox/exec/fuzzer/JoinFuzzer.cpp
Outdated
std::string filter = | ||
makeJoinFilter(probeKeys, buildKeys, probeInput, buildInput); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This filter is not used in the query plan?
velox/exec/fuzzer/JoinFuzzer.cpp
Outdated
if (vectorFuzzer_.coinToss(0.1) && !probeInput.empty() && | ||
!buildInput.empty()) { | ||
RowTypePtr rowType = vectorFuzzer_.coinToss(0.5) | ||
? asRowType(probeInput[0]->type()) | ||
: asRowType(buildInput[0]->type()); | ||
|
||
for (int i = 0; i < rowType->size(); i++) { | ||
// TODO: Add support for non-boolean types. | ||
if (rowType->childAt(i)->isBoolean() && vectorFuzzer_.coinToss(0.5)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why tossing a coin at line 417 again? Also, it feels like the chance of choosing to generate a filter and rowType happens to contain a boolean column is low. What about we decide whether to generate a join filter and the filter column type in the verify() method first, and if we do, we then make build input or probe input to contain a payload column of the chosen type?
velox/exec/fuzzer/JoinFuzzer.cpp
Outdated
const std::string filter = | ||
withFilter ? makeJoinFilter(probeKeys, buildKeys) : ""; | ||
const std::string filter = withFilter | ||
? makeJoinFilter(probeKeys, buildKeys, probeInput, buildInput) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's check out all the methods in this file that create join query plans. We need to test joins with filters not only for nested loop joins.
5b5cce0
to
cc2a5c4
Compare
Summary: This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean columns. The next steps will be to support integer columns. Differential Revision: D65629460
This pull request was exported from Phabricator. Differential Revision: D65629460 |
cc2a5c4
to
5b97910
Compare
Summary: This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean columns. The next steps will be to support integer columns. Differential Revision: D65629460
This pull request was exported from Phabricator. Differential Revision: D65629460 |
Summary: This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean and integer columns. Differential Revision: D65629460
5b97910
to
e9b024b
Compare
This pull request was exported from Phabricator. Differential Revision: D65629460 |
Summary: This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean and integer columns. Differential Revision: D65629460
e9b024b
to
cde18d0
Compare
This pull request was exported from Phabricator. Differential Revision: D65629460 |
Summary: This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean and integer columns. Differential Revision: D65629460
cde18d0
to
c349847
Compare
This pull request was exported from Phabricator. Differential Revision: D65629460 |
c349847
to
13fe9ce
Compare
Summary: This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean and integer columns. Differential Revision: D65629460
This pull request was exported from Phabricator. Differential Revision: D65629460 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good to me overall. I'll take a look at the other diff for ReferenceQueryRunner.
velox/exec/fuzzer/JoinFuzzer.cpp
Outdated
@@ -1014,10 +1034,30 @@ void JoinFuzzer::verify(core::JoinType joinType) { | |||
const auto numKeys = nullAware ? 1 : randInt(1, 5); | |||
|
|||
// Pick number and types of join keys. | |||
const std::vector<TypePtr> keyTypes = generateJoinKeyTypes(numKeys); | |||
std::vector<TypePtr> keyTypes = generateJoinKeyTypes(numKeys); | |||
std::string semiFilter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to avoid creating a new term, especially because the term "semi" could make people think about semi-join and cause confusion. What about we rename it as "nonKeyFilter" or just "extraFilter"?
13fe9ce
to
deb4a87
Compare
Summary: This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean and integer columns. Differential Revision: D65629460
This pull request was exported from Phabricator. Differential Revision: D65629460 |
Summary: This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean and integer columns. Differential Revision: D65629460
deb4a87
to
4d06002
Compare
This pull request was exported from Phabricator. Differential Revision: D65629460 |
Summary: This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean and integer columns. Differential Revision: D65629460
4d06002
to
3bb0edd
Compare
This pull request was exported from Phabricator. Differential Revision: D65629460 |
…eferenceQueryRunners (facebookincubator#11566) Summary: This change updates both the DuckQueryRunner and PrestoQueryRunner to parse filters in their hasJoinNode toSql methods. Differential Revision: D66021799
…ery Runners (facebookincubator#11576) Summary: The select clause is completely missing in the produced query string. Differential Revision: D66132514
Summary: This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean and integer columns. Differential Revision: D65629460
3bb0edd
to
ffeb950
Compare
This pull request was exported from Phabricator. Differential Revision: D65629460 |
Summary: This changes adds a semi filter to the join filter 10% of the time. Currently it only supports boolean columns. The next steps will be to support integer columns.
Differential Revision: D65629460