Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: declare pyarrow optional for 3.12 #385

Merged
merged 4 commits into from
Oct 12, 2023
Merged

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Oct 10, 2023

I noticed that we were failing in CI due to Arrow being installed from sdist on Python 3.12.

In future, perhaps these large binary dependencies should not be installed from source as a rule. That might be a forward-looking thing to do, we also need to just ensure we don't install PyArrow here, because it's not yet available on 3.12.

This PR changes our CI to avoid pulling in pyarrow via dask, and also ensures that our own pyproject.toml just skips pyarrow and aiohttp if we are on 3.12

@codecov-commenter
Copy link

codecov-commenter commented Oct 10, 2023

Codecov Report

Merging #385 (71916e4) into main (c037cbd) will not change coverage.
The diff coverage is n/a.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@           Coverage Diff           @@
##             main     #385   +/-   ##
=======================================
  Coverage   93.23%   93.23%           
=======================================
  Files          24       24           
  Lines        3043     3043           
=======================================
  Hits         2837     2837           
  Misses        206      206           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@agoose77
Copy link
Collaborator Author

@douglasdavis would you mind taking a look at our dependence upon pyarrow? I assume we want it to be optional here, even in our tests if it's not installed?

@douglasdavis
Copy link
Collaborator

@douglasdavis would you mind taking a look at our dependence upon pyarrow?

@agoose77, sure! as of right now the sample datasets used for a lot of tests are instantiated in a pytest fixture via a dak.from_parquet call; which is why pyarrow is a hard dependency to run the test suite

I assume we want it to be optional here, even in our tests if it's not installed?

I think this would be a good idea in the long run. A short task would be to pytest.importorskip("pyarrow") but this would lead to many tests getting skipped. A longer but better task would be to transition away from using small parquet sample datasets for running the tests to small-something-else. Originally I wrote small JSON samples, but since parquet supported column optimization I transitioned to using Parquet samples just to get a little more implicit testing of column optimization.

I'm happy to spend some time thinking and working on this

@douglasdavis douglasdavis merged commit 507c908 into main Oct 12, 2023
20 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants