Skip to content

Commit

Permalink
docs(python): Minor tweak in code example in section Coming from Pand…
Browse files Browse the repository at this point in the history
…as (#11764)
  • Loading branch information
jrycw authored Oct 17, 2023
1 parent ef503c3 commit 4fb3f07
Showing 1 changed file with 11 additions and 14 deletions.
25 changes: 11 additions & 14 deletions docs/user-guide/migration/pandas.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,19 +147,20 @@ called `hundredXValue` where the `value` column is multiplied by 100.
In `Pandas` this would be:

```python
df["tenXValue"] = df["value"] * 10
df["hundredXValue"] = df["value"] * 100
df.assign(
tenXValue=lambda df_: df_.value * 10,
hundredXValue=lambda df_: df_.value * 100
)
```

These column assignments are executed sequentially.

In `Polars` we add columns to `df` using the `.with_columns` method and name them with
the `.alias` method:
In `Polars` we add columns to `df` using the `.with_columns` method:

```python
df.with_columns(
(pl.col("value") * 10).alias("tenXValue"),
(pl.col("value") * 100).alias("hundredXValue"),
tenXValue=pl.col("value") * 10,
hundredXValue=pl.col("value") * 100,
)
```

Expand All @@ -174,7 +175,7 @@ the values in column `a` based on a condition. When the value in column `c` is e
In `Pandas` this would be:

```python
df.loc[df["c"] == 2, "a"] = df.loc[df["c"] == 2, "b"]
df.assign(a=lambda df_: df_.a.where(df_.c != 2, df_.b))
```

while in `Polars` this would be:
Expand All @@ -187,21 +188,17 @@ df.with_columns(
)
```

The `Polars` way is pure in that the original `DataFrame` is not modified. The `mask` is
also not computed twice as in `Pandas` (you could prevent this in `Pandas`, but that
would require setting a temporary variable).

Additionally `Polars` can compute every branch of an `if -> then -> otherwise` in
`Polars` can compute every branch of an `if -> then -> otherwise` in
parallel. This is valuable, when the branches get more expensive to compute.

#### Filtering

We want to filter the dataframe `df` with housing data based on some criteria.

In `Pandas` you filter the dataframe by passing Boolean expressions to the `loc` method:
In `Pandas` you filter the dataframe by passing Boolean expressions to the `query` method:

```python
df.loc[(df['sqft_living'] > 2500) & (df['price'] < 300000)]
df.query('m2_living > 2500 and price < 300000')
```

while in `Polars` you call the `filter` method:
Expand Down

0 comments on commit 4fb3f07

Please sign in to comment.