Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(python): Minor tweak in code example in section Coming from Pandas #11764

Merged
merged 5 commits into from
Oct 17, 2023
Merged

docs(python): Minor tweak in code example in section Coming from Pandas #11764

merged 5 commits into from
Oct 17, 2023

Conversation

jrycw
Copy link
Contributor

@jrycw jrycw commented Oct 16, 2023

While it's true that Polars offers the capability to execute data transformations in parallel using expressions, some of the examples in the documentation don't provide a complete 1-to-1 comparison between Pandas and Polars. To make this comparison more comprehensive, I would recommend contrasting Pandas assign and query methods with Polars with_columns and filter methods.

@github-actions github-actions bot added documentation Improvements or additions to documentation python Related to Python Polars labels Oct 16, 2023
Copy link
Collaborator

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your PR - generally looks good, just got a minor request

also, will need to address the lint error

Comment on lines 177 to 179
df.loc[df["c"] == 2, "a"] = df.loc[df["c"] == 2, "b"]
df.assign(a=lambda df_: np.where(df_.c == 2, df_.b, df_.a))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly more idiomatic would be to use pandas.Series.where directly

df.assign(a=lambda df_: df_.a.where(df_.c != 2, df_.b))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MarcoGorelli, I wasn't aware that Pandas dataframe includes its own where method.

Comment on lines +151 to +152
tenXValue=lambda df_: df_.value * 10,
hundredXValue=lambda df_: df_.value * 100
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the polars example should probably also use this syntax then, i.e.

df.with_columns(
    tenXValue=pl.col("value") * 10,
    hundredXValue=pl.col("value") * 100,
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely certain, but it appears that using .alias is the recommended approach for renaming columns.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering this is a brief migration guide to help newcomers from Pandas, I agree with your perspective.

Copy link
Collaborator

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an improvement, and makes the comparisons more "apples to apples" - nice one @jrycw !

@ritchie46 ritchie46 merged commit 4fb3f07 into pola-rs:main Oct 17, 2023
4 checks passed
@jrycw jrycw deleted the modify-migrating-from-pandas-code-example branch October 17, 2023 05:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants