Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All join clauses not handled properly #248

Open
Thomzoy opened this issue Jan 17, 2025 · 0 comments · May be fixed by #249
Open

All join clauses not handled properly #248

Thomzoy opened this issue Jan 17, 2025 · 0 comments · May be fixed by #249
Labels
bug Something isn't working

Comments

@Thomzoy
Copy link

Thomzoy commented Jan 17, 2025

When running df.join, the different values of the how argument aren't always handled correclty:

df_1 = pd.DataFrame(
    {
        "A": ["A", "A", "B", "C"],
        "my_id": [1, 2, 3, 3],
    }
).to_dict("records")

df_2 = pd.DataFrame(
    {
        "B": ["A", "A", "B", "C"],
        "my_id": [1, 2, 3, 4],
    }
).to_dict("records")


df_1 = session.createDataFrame(df_1)
df_2 = session.createDataFrame(df_2)
  • Default side

When default side isn't provided, it can cause an Exception downstream. For instance using DuckDB'qs backend, we get a Parser Error: syntax error at or near "OUTER" (Since no side is provided, which is mandatory here).
It may be fixed in sqlglot, but we can also map all correct values for how to values with default side added

  • Output column selection

When running SEMI or ANTI join, the output selected columns fails since it includes columns from both tables.
In can be fixed by handling this case by only adding the self columns

I suggested an implementation here, but didn't test it thoroughly yet !

@Thomzoy Thomzoy linked a pull request Jan 17, 2025 that will close this issue
@eakmanrq eakmanrq added the bug Something isn't working label Jan 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants