From 4fb3f070ab0c4af5f606a26f8b9ae879d4cd8801 Mon Sep 17 00:00:00 2001 From: Jerry Wu Date: Tue, 17 Oct 2023 13:04:05 +0800 Subject: [PATCH] docs(python): Minor tweak in code example in section Coming from Pandas (#11764) --- docs/user-guide/migration/pandas.md | 25 +++++++++++-------------- 1 file changed, 11 insertions(+), 14 deletions(-) diff --git a/docs/user-guide/migration/pandas.md b/docs/user-guide/migration/pandas.md index d6674ac43f06..a9a039a7b1d4 100644 --- a/docs/user-guide/migration/pandas.md +++ b/docs/user-guide/migration/pandas.md @@ -147,19 +147,20 @@ called `hundredXValue` where the `value` column is multiplied by 100. In `Pandas` this would be: ```python -df["tenXValue"] = df["value"] * 10 -df["hundredXValue"] = df["value"] * 100 +df.assign( + tenXValue=lambda df_: df_.value * 10, + hundredXValue=lambda df_: df_.value * 100 +) ``` These column assignments are executed sequentially. -In `Polars` we add columns to `df` using the `.with_columns` method and name them with -the `.alias` method: +In `Polars` we add columns to `df` using the `.with_columns` method: ```python df.with_columns( - (pl.col("value") * 10).alias("tenXValue"), - (pl.col("value") * 100).alias("hundredXValue"), + tenXValue=pl.col("value") * 10, + hundredXValue=pl.col("value") * 100, ) ``` @@ -174,7 +175,7 @@ the values in column `a` based on a condition. When the value in column `c` is e In `Pandas` this would be: ```python -df.loc[df["c"] == 2, "a"] = df.loc[df["c"] == 2, "b"] +df.assign(a=lambda df_: df_.a.where(df_.c != 2, df_.b)) ``` while in `Polars` this would be: @@ -187,21 +188,17 @@ df.with_columns( ) ``` -The `Polars` way is pure in that the original `DataFrame` is not modified. The `mask` is -also not computed twice as in `Pandas` (you could prevent this in `Pandas`, but that -would require setting a temporary variable). - -Additionally `Polars` can compute every branch of an `if -> then -> otherwise` in +`Polars` can compute every branch of an `if -> then -> otherwise` in parallel. This is valuable, when the branches get more expensive to compute. #### Filtering We want to filter the dataframe `df` with housing data based on some criteria. -In `Pandas` you filter the dataframe by passing Boolean expressions to the `loc` method: +In `Pandas` you filter the dataframe by passing Boolean expressions to the `query` method: ```python -df.loc[(df['sqft_living'] > 2500) & (df['price'] < 300000)] +df.query('m2_living > 2500 and price < 300000') ``` while in `Polars` you call the `filter` method: