Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use custom resolver for query and eval with nested frames. #150

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

gitosaurus
Copy link

Verify preflighting of nested expressions using AST visitation.

Remove logic for splitting queries by string. Now the evaluation is handled by a nested column resolver, and the mixed-mode expressions are preflighted by examining the parsed abstract syntax tree for the query expression.

Resolves #59.

Change Description

  • My PR includes a link to the issue that I am addressing

Solution Description

Code Quality

  • I have read the Contribution Guide
  • My code follows the code style of this project
  • My code builds (or compiles) cleanly without any errors or warnings
  • My code contains relevant comments and necessary documentation

Project-Specific Pull Request Checklists

Bug Fix Checklist

  • My fix includes a new test that breaks as a result of the bug (if possible)
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

codecov bot commented Oct 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.47%. Comparing base (12d1293) to head (67c49ff).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #150   +/-   ##
=======================================
  Coverage   99.47%   99.47%           
=======================================
  Files          14       14           
  Lines         948      955    +7     
=======================================
+ Hits          943      950    +7     
  Misses          5        5           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Verify preflighting of nested expressions using AST visitation.

Remove logic for splitting queries by string.  Now the evaluation is
handled by a nested column resolver, and the mixed-mode expressions
are preflighted by examining the parsed abstract syntax tree for
the query expression.
Also remove `_ensure_spacing`, no longer used.
Copy link

github-actions bot commented Oct 9, 2024

Before [12d1293] After [9f4a1fb] Ratio Benchmark (Parameter)
255M 258M 1.01 benchmarks.AssignSingleDfToNestedSeries.peakmem_run
272M 274M 1.01 benchmarks.ReassignHalfOfNestedSeries.peakmem_run
83.9M 83.9M 1 benchmarks.NestedFrameAddNested.peakmem_run
9.11±0.05ms 9.09±0.03ms 1 benchmarks.NestedFrameAddNested.time_run
90.8M 90.8M 1 benchmarks.NestedFrameQuery.peakmem_run
87.4M 87.5M 1 benchmarks.NestedFrameReduce.peakmem_run
1.62±0ms 1.62±0.01ms 1 benchmarks.NestedFrameReduce.time_run
7.05±0.07ms 6.80±0.06ms 0.96 benchmarks.NestedFrameQuery.time_run
29.7±2ms 28.3±2ms 0.95 benchmarks.AssignSingleDfToNestedSeries.time_run
55.5±2ms 52.6±1ms 0.95 benchmarks.ReassignHalfOfNestedSeries.time_run

Click here to view all benchmarks.

This logic was copied from pd.DataFrame.query, and the accompanying
comment said that it was to handle an occasional case where
`self.loc[b]` would raise an error on a multi-dimensional `b`, but
`self[b]` would succeed.  I can't cause this error with `.loc` anymore,
so the code coverage complains about the unexercised exception
clause.  Removing it since the limitation that inspired it seems
to be gone now.
Copy link
Collaborator

@hombit hombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

I'd ask for more testing and not-implemented error-handling. For example, I'm not sure that assignment with eval is supported: .eval("nested.c = nested.a + nested.b") or .eval("nested.b = x + nested.a").

It looks like it may also close #144 and #146? If yes, could you please add tests for covering those issues?

.gitignore Outdated Show resolved Hide resolved
src/nested_pandas/nestedframe/utils.py Outdated Show resolved Hide resolved
src/nested_pandas/nestedframe/core.py Outdated Show resolved Hide resolved
src/nested_pandas/nestedframe/core.py Outdated Show resolved Hide resolved
@gitosaurus gitosaurus linked an issue Oct 9, 2024 that may be closed by this pull request
3 tasks
@gitosaurus
Copy link
Author

Thank you!

I'd ask for more testing and not-implemented error-handling. For example, I'm not sure that assignment with eval is supported: .eval("nested.c = nested.a + nested.b") or .eval("nested.b = x + nested.a").

It looks like it may also close #144 and #146? If yes, could you please add tests for covering those issues?

Yes, this PR does resolve #146, thanks for the tip! #144 is not resolved by this PR, but perhaps this same approach could be extended to cover it in a separate PR.

Good catch on the assignment problem. The existing ability for eval() to insert base columns still works, but you're right, it can't insert into the nest yet. I'll add those tests and see if adding nested columns can be done within the scope of this PR. If not, I'll at least make a clearer error.

Prevent users from depending on the public interface, and also
from assuming that a `NestedSeries` has a nest within it, since
that is the existing meaning of that prefix.
Copy link
Collaborator

@dougbrn dougbrn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @gitosaurus , looks really good! I added a few comments, in addition to the feedback Kostya has provided.

src/nested_pandas/nestedframe/core.py Show resolved Hide resolved
src/nested_pandas/nestedframe/core.py Outdated Show resolved Hide resolved
src/nested_pandas/nestedframe/core.py Outdated Show resolved Hide resolved
raise ValueError(msg)
kwargs["level"] = kwargs.pop("level", 0) + 1
kwargs["target"] = None
# At present, the query expression must be either entirely within the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally fine to keep this behavior, but if you believe this new implementation would support multi-target queries this could be a restriction we'd lift. Maybe opening a separate issue for this would be best

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this PR makes a significant step toward multi-target queries, but it might be better to make the next step in a separate PR, for simplicity's sake. Will investigate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

query("nested.b.isna()") fails Nested-pandas query does not handle scientific notation
3 participants