Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

from_pandas behaviour change/regression (series constructor "strict" changes) #18092

Open
2 tasks done
alexander-beedie opened this issue Aug 8, 2024 · 0 comments
Open
2 tasks done
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Aug 8, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

FYI: @stinodego, looks related to #16939.

import pandas as pd
import numpy as np

pf = pd.DataFrame({
    "misc": np.array([
        '14.67%', '10.50%', '33.80%', date(1990, 7, 16),
    ]),
})
# misc
# 0      14.67%
# 1      10.50%
# 2      33.80%
# 3  1990-07-16

import polars as pl
pl.from_pandas(pf, schema_overrides={"misc": pl.String})
# TypeError: 'date' object cannot be converted to 'PyString'

Simplified further, the root cause looks like Series constructor strictness changes, as we get the same error even when setting strict=False:

values = np.array(['4.67%', '10.50%', '33.80%', date(1990, 7, 16)])
pl.Series("misc", values, dtype=pl.String, strict=False)
# TypeError: 'date' object cannot be converted to 'PyString'

Issue description

Previously (pre 1.0) this would convert the unexpected date value to null (as per non-strict behaviour).

Expected behavior

Previous behaviour would create the frame like so, with invalid values set to null:

shape: (4, 1)
┌────────┐
│ misc   │
│ ---    │
│ str    │
╞════════╡
│ 14.67% │
│ 10.50% │
│ 33.80% │
│ null   │
└────────┘

We should likely expose the "strict" param to from_pandas to opt-in to this behaviour (the underlying pandas_to_pyseries and pandas_to_pydf methods already accept it).

Passing through "strict" without further changes doesn't seem sufficient to fix the issue though, as demonstrated in the Series simplification.

Installed versions

Latest compiled `main`
@alexander-beedie alexander-beedie added bug Something isn't working python Related to Python Polars needs triage Awaiting prioritization by a maintainer labels Aug 8, 2024
@alexander-beedie alexander-beedie changed the title from_pandas behaviour change/regression from_pandas behaviour change/regression (series constructor "strict" changes) Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

1 participant