Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rust): Prevent recursive AlreadyEncountered error formatting #12097

Closed
wants to merge 5 commits into from

Conversation

trueb2
Copy link
Contributor

@trueb2 trueb2 commented Oct 29, 2023

Fixes #8277

Successfully returns error messages with N=100 in debug.

Error message example with N=4 from the reproduction

thread 'main' panicked at 'called Result::unwrap() on an Err value: ColumnNotFound(ErrString("foo\n\nError originated just after this operation:\nDF ["a"]; PROJECT */1 COLUMNS; SELECTION: "None"\n\nError originated just after this operation:\nFILTER [(col("foo")) > (col("bar"))] FROM\nErrorStateSync(AlreadyEncountered(not found: foo\n\nError originated just after this operation:\nDF ["a"]; PROJECT */1 COLUMNS; SELECTION: "None"))\nDF ["a"]; PROJECT */1 COLUMNS; SELECTION: "None"\n\nError originated just after this operation:\nErrorStateSync(Multiple AlreadyEncountered)\nFILTER [(col("foo")) > (col("bar"))] FROM\nErrorStateSync(AlreadyEncountered(not found: foo\n\nError originated just after this operation:\nDF ["a"]; PROJECT */1 COLUMNS; SELECTION: "None"))\nDF ["a"]; PROJECT */1 COLUMNS; SELECTION: "None"\n\nError originated just after this operation:\nFILTER [(col("foo")) > (col("bar"))] FROM\nErrorStateSync(Multiple AlreadyEncountered)\nErrorStateSync(Multiple AlreadyEncountered)\nFILTER [(col("foo")) > (col("bar"))] FROM\nErrorStateSync(AlreadyEncountered(not found: foo\n\nError originated just after this operation:\nDF ["a"]; PROJECT */1 COLUMNS; SELECTION: "None"))\nDF ["a"]; PROJECT */1 COLUMNS; SELECTION: "None"\n\nError originated just after this operation:\nErrorStateSync(Multiple AlreadyEncountered)\nFILTER [(col("foo")) > (col("bar"))] FROM\nErrorStateSync(Multiple AlreadyEncountered)\nErrorStateSync(Multiple AlreadyEncountered)\nFILTER [(col("foo")) > (col("bar"))] FROM\nErrorStateSync(AlreadyEncountered(not found: foo\n\nError originated just after this operation:\nDF ["a"]; PROJECT */1 COLUMNS; SELECTION: "None"))\nDF ["a"]; PROJECT */1 COLUMNS; SELECTION: "None"\n\nError originated just after this operation:\nFILTER [(col("foo")) > (col("bar"))] FROM\nErrorStateSync(Multiple AlreadyEncountered)\nErrorStateSync(Multiple AlreadyEncountered)\nFILTER [(col("foo")) > (col("bar"))] FROM\nErrorStateSync(Multiple AlreadyEncountered)\nErrorStateSync(Multiple AlreadyEncountered)\nFILTER [(col("foo")) > (col("bar"))] FROM\nErrorStateSync(AlreadyEncountered(not found: foo\n\nError originated just after this operation:\nDF ["a"]; PROJECT */1 COLUMNS; SELECTION: "None"))\nDF ["a"]; PROJECT */1 COLUMNS; SELECTION: "None"\n\nError originated just after this operation:\nErrorStateSync(Multiple AlreadyEncountered)\nFILTER [(col("foo")) > (col("bar"))] FROM\nErrorStateSync(Multiple AlreadyEncountered)\nErrorStateSync(Multiple AlreadyEncountered)\nFILTER [(col("foo")) > (col("bar"))] FROM\nErrorStateSync(Multiple AlreadyEncountered)\nErrorStateSync(Multiple AlreadyEncountered)\nFILTER [(col("foo")) > (col("bar"))] FROM\nErrorStateSync(AlreadyEncountered(not found: foo\n\nError originated just after this operation:\nDF ["a"]; PROJECT */1 COLUMNS; SELECTION: "None"))\nDF ["a"]; PROJECT */1 COLUMNS; SELECTION: "None"\n\nError originated just after this operation:\nFILTER [(col("foo")) > (col("bar"))] FROM\nErrorStateSync(Multiple AlreadyEncountered)\nErrorStateSync(Multiple AlreadyEncountered)\nFILTER [(col("foo")) > (col("bar"))] FROM\nErrorStateSync(Multiple AlreadyEncountered)\nErrorStateSync(Multiple AlreadyEncountered)\nFILTER [(col("foo")) > (col("bar"))] FROM\nErrorStateSync(Multiple AlreadyEncountered)\nErrorStateSync(Multiple AlreadyEncountered)\nFILTER [(col("foo")) > (col("bar"))] FROM\nErrorStateSync(AlreadyEncountered(not found: foo\n\nError originated just after this operation:\nDF ["a"]; PROJECT */1 COLUMNS; SELECTION: "None"))\nDF ["a"]; PROJECT */1 COLUMNS; SELECTION: "None""))', src/main.rs:17:26


</details>

@trueb2 trueb2 requested review from ritchie46 and orlp as code owners October 29, 2023 20:07
@github-actions github-actions bot added fix Bug fix rust Related to Rust Polars labels Oct 29, 2023
@ritchie46
Copy link
Member

Thanks. Can you add a test on the python side?

@trueb2
Copy link
Contributor Author

trueb2 commented Oct 31, 2023

I am not sure how to test this on the python side. I only use the Rust interfaces personally, so there may be something I am missing. The same test case does not work for python because it collects more aggressively.

Internally, it looks like all of the python interfaces prevent optimizations and recursive error formatting by
returning a data frame after .collect(no_optimization=True)

import polars as pl

def cool_stuff(lf):
    return (
        lf.select([pl.col("foo"), pl.col("bar")])
        .filter(pl.col("foo") > pl.col("bar"))
        .sort("a")
    )

df = pl.DataFrame({"a": [1, 2, 3], "b": [1, 2, 3]})
n = 10
for _ in range(n):
    df = cool_stuff(df)
Traceback (most recent call last):
  File "/Users/jwtrueb/Desktop/workspace/polars-recursive-error/plerr.py", line 13, in <module>
    df = cool_stuff(df)
         ^^^^^^^^^^^^^^
  File "/Users/jwtrueb/Desktop/workspace/polars-recursive-error/plerr.py", line 5, in cool_stuff
    lf.select([pl.col("foo"), pl.col("bar")])
  File "/opt/homebrew/lib/python3.11/site-packages/polars/internals/dataframe/frame.py", line 5784, in select
    .collect(no_optimization=True)
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/polars/internals/lazyframe/frame.py", line 1146, in collect
    return pli.wrap_df(ldf.collect())
                       ^^^^^^^^^^^^^
exceptions.ColumnNotFoundError: foo

> Error originated just after operation: '  DF ["a", "b"]; PROJECT */2 COLUMNS; SELECTION: "None"
'
This operation could not be added to the plan.

@ritchie46
Copy link
Member

You must work on LazyFrames:

df = pl.LazyFrame({"a": [1, 2, 3], "b": [1, 2, 3]})

And then you can test the python error message as string format.

@trueb2
Copy link
Contributor Author

trueb2 commented Dec 16, 2023

I think this is ready for review again.

Noticeably, i hit an issue running python locally tests that seems to be known: #11251 (comment)

While this CI suite passes, it fails on local mac without disabling those panic tests. That seems to be related to handling panic handling in the python test runner.
test_panic_error
test_date_range_invalid_time_unit
test_datetime_range_invalid_time_unit

@trueb2 trueb2 requested a review from c-peters as a code owner January 15, 2024 18:48
@stinodego
Copy link
Member

stinodego commented Feb 14, 2024

Thanks @trueb2 - but it seems this was fixed in #8107 which was recently merged.

@stinodego stinodego closed this Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow error formatting on large plan presents like hang
3 participants