Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplementation of NestedExtensionArray #39

Merged
merged 8 commits into from
May 3, 2024
Merged

Reimplementation of NestedExtensionArray #39

merged 8 commits into from
May 3, 2024

Conversation

hombit
Copy link
Collaborator

@hombit hombit commented May 1, 2024

This reimplements NestedExtensionArray inheriting it directly from ExtensionArray instead of ArrowExtensionArray. It removes dependency on unstable ArrowExtensionArray, gives us more control over the behavior, and allows some performance optimization.

If fixes #15, fixes #24

Change Description

  • My PR includes a link to the issue that I am addressing

Solution Description

Code Quality

  • I have read the Contribution Guide
  • My code follows the code style of this project
  • My code builds (or compiles) cleanly without any errors or warnings
  • My code contains relevant comments and necessary documentation

Project-Specific Pull Request Checklists

Bug Fix Checklist

  • My fix includes a new test that breaks as a result of the bug (if possible)
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

New Feature Checklist

  • I have added or updated the docstrings associated with my feature using the NumPy docstring format
  • I have updated the tutorial to highlight my new feature (if appropriate)
  • I have added unit/End-to-End (E2E) test cases to cover my new feature
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

Documentation Change Checklist

Build/CI Change Checklist

  • If required or optional dependencies have changed (including version numbers), I have updated the README to reflect this
  • If this is a new CI setup, I have added the associated badge to the README

Other Change Checklist

  • Any new or updated docstrings use the NumPy docstring format.
  • I have updated the tutorial to highlight my new feature (if appropriate)
  • I have added unit/End-to-End (E2E) test cases to cover any changes
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

Copy link

github-actions bot commented May 1, 2024

Before [6d09b23] After [fe53b60] Ratio Benchmark (Parameter)
1.73±1s 3.51±2s ~2.03 benchmarks.time_computation
3.5k 232 0.07 benchmarks.mem_list

Click here to view all benchmarks.

Copy link

codecov bot commented May 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.35%. Comparing base (6d09b23) to head (cb5c4a5).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #39      +/-   ##
==========================================
+ Coverage   95.68%   97.35%   +1.66%     
==========================================
  Files          14       14              
  Lines         626      794     +168     
==========================================
+ Hits          599      773     +174     
+ Misses         27       21       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hombit hombit marked this pull request as ready for review May 1, 2024 15:22
@hombit hombit marked this pull request as draft May 1, 2024 16:39
@hombit hombit requested review from dougbrn and wilsonbb May 3, 2024 11:57
@hombit hombit marked this pull request as ready for review May 3, 2024 11:58
Copy link
Collaborator

@dougbrn dougbrn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a couple nits/questions that you can feel free to address or ignore!

src/nested_pandas/series/ext_array.py Outdated Show resolved Hide resolved
src/nested_pandas/series/ext_array.py Show resolved Hide resolved
src/nested_pandas/series/ext_array.py Show resolved Hide resolved
Copy link
Contributor

@wilsonbb wilsonbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, though I'm not very knowledgeable about arrow so there might be some aspects that I'm missing

One thing I'm curious about given the size of the change is if we can go ahead re-run @dougbrn's benchmarking notebook, but I don't think it should strictly block this PR

src/nested_pandas/series/ext_array.py Show resolved Hide resolved
src/nested_pandas/series/ext_array.py Show resolved Hide resolved
src/nested_pandas/series/ext_array.py Outdated Show resolved Hide resolved
@dougbrn
Copy link
Collaborator

dougbrn commented May 3, 2024

@wilsonbb @hombit I've run my benchmarking notebook on this branch and it looks like performance is basically identical.

@hombit hombit merged commit 109c16c into main May 3, 2024
11 checks passed
@hombit hombit deleted the ext-array-reimpl branch May 3, 2024 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proper support of missed values Do not depend on internal pandas API
3 participants