-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs(python): Minor tweak in code example in section Coming from Pandas #11764
docs(python): Minor tweak in code example in section Coming from Pandas #11764
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your PR - generally looks good, just got a minor request
also, will need to address the lint error
docs/user-guide/migration/pandas.md
Outdated
df.loc[df["c"] == 2, "a"] = df.loc[df["c"] == 2, "b"] | ||
df.assign(a=lambda df_: np.where(df_.c == 2, df_.b, df_.a)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slightly more idiomatic would be to use pandas.Series.where
directly
df.assign(a=lambda df_: df_.a.where(df_.c != 2, df_.b))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MarcoGorelli, I wasn't aware that Pandas
dataframe includes its own where
method.
tenXValue=lambda df_: df_.value * 10, | ||
hundredXValue=lambda df_: df_.value * 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the polars example should probably also use this syntax then, i.e.
df.with_columns(
tenXValue=pl.col("value") * 10,
hundredXValue=pl.col("value") * 100,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely certain, but it appears that using .alias
is the recommended approach for renaming columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would just minimise the difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering this is a brief migration guide to help newcomers from Pandas
, I agree with your perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an improvement, and makes the comparisons more "apples to apples" - nice one @jrycw !
While it's true that
Polars
offers the capability to execute data transformations in parallel using expressions, some of the examples in the documentation don't provide a complete 1-to-1 comparison betweenPandas
andPolars
. To make this comparison more comprehensive, I would recommend contrastingPandas
assign
andquery
methods withPolars
with_columns
andfilter
methods.