Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: Migrate from hatch to uv #3723

Merged
merged 58 commits into from
Jan 17, 2025
Merged

build: Migrate from hatch to uv #3723

merged 58 commits into from
Jan 17, 2025

Conversation

dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Dec 29, 2024

Will close #3577

Description

This PR transitions us from hatch to uv with a solution for the issue I raised in (#3577 (comment)).

Note

Don't be alarmed by the +3k additions, it is mostly (89.4%) in uv.lock
More info

Using uv

Install uv for the first time or run:

uv self update

Install python:

uv python install 3.12

Go to the root of the repo:

cd altair/

Initialize a new virtual env:

uv venv -p 3.12

Install/update dependencies:

uv sync --all-extras

Also see CONTRIBUTING

Task Runner

I've been following astral-sh/uv#5903 pretty closely and after experimenting rolling my own runner - I settled on https://github.com/taskipy/taskipy.

Usage

Works in a similar way to hatch run ..., with commands defined in [tool.taskipy.tasks].

Changes

Composition

Composing tasks uses "... && ...", rather than ["...", "..."]

Invoking

Invoking a task:

hatch run generate-schema-wrapper
uv run task generate-schema-wrapper

Alternatives

These were also mentioned in astral-sh/uv#5903.

I chose taskipy as:

  • has enough functionality for what we're doing
  • easy to understand
  • easiest to replace later

The last point is quite apparent when looking at the changes in (00f353a) vs (https://github.com/vega/altair/blob/ba836abb77b8e1eb8b5d194ed9442916ccdbf2a9/tasks.toml)

Tasks

https://ss64.com/bash/mkdir.html
We would want to see an error if `doc/` didn't exist. The docs wouldn't build as all the content is in that directory.

Working on migrating to a cross-platform shell that doesn't require this option anyway https://www.nushell.sh/commands/docs/mkdir.html
Decouples from `bash`, simplifies (#3577 (comment))
Provides 2 interfaces:
- `tasks.py`: to define using decorators
- `tasks.toml`: an example of defining in a similar way to `hatch` scripts

#3577 (comment)
Steps to reproduce:
- `uv self update` or install `uv` for the [first time](https://docs.astral.sh/uv/getting-started/installation/)
- `uv python install 3.12`
- `cd $ALTAIR_REPO_ROOT`
- `uv venv -p 3.12 --seed`
- `uv sync --all-extras`
Bug revealed after changing the `pytest` task to only contain `"pytest"`
- Add docs
- Handle semicolon joining in one place
- align `mkdir_cmd` with `mkdir -p` (https://ss64.com/bash/mkdir.html)
`*_cmd` functions and these variables are unrelated

Planning to namespace the functions, using `cmd`
All of these tools are now accessible under a single namespace, with discoverable docs
You can now safely run:
```py
import tools
tools.tasks.app.run(["lint", "format"])
```
Terms like "clean", "build", "publish" have another meaning when talking about the distribution
https://docs.astral.sh/uv/concepts/projects/build/
- Extra prep for string escaping while in a subprocess
- Use `--force` in `git add` to override (https://github.com/vega/altair/blob/7c2d97ffe80749a7cd1b24703547e40f4918172d/doc/.gitignore#L1)
- Raise a more informative error when a subprocess fails
All previous functionality is possible without this now.
Did some tidying up of what is left using (https://github.com/vega/vega-datasets/blob/369b462f7505e4ef3454668793e001e3620861ff/taplo.toml)

## Remaining sections
- `build`, `metadata`, `version` tables are used by `hatchling`
  - (https://hatch.pypa.io/latest/why/#build-backend)
- The `doc` environment is retained **only** due to how `_HatchRunner` is implemented
  - Otherwise we can remove that
  - (6e792a1)
- The test matrix functionality isn't something I've tried to reproduce, so left that in
  - Unsure how often others use it, but we could swap that out for `nox`
    - https://github.com/narwhals-dev/narwhals/blob/024d5d2294d1dcdd2e2f043d59eb03a3238c9e87/noxfile.py#L13
tools/sync_website.py Outdated Show resolved Hide resolved
@mattijn
Copy link
Contributor

mattijn commented Jan 10, 2025

You should now have access to the Altair docs repo to test things further👍

@mattijn
Copy link
Contributor

mattijn commented Jan 10, 2025

A bit out of scope, but I'm also fine if updating the docs becomes a manual triggered workflow via github actions.

@dangotbanned
Copy link
Member Author

dangotbanned commented Jan 10, 2025

You should now have access to the Altair docs repo to test things further👍

Ah thanks @mattijn!

A bit out of scope, but I'm also fine if updating the docs becomes a manual triggered workflow via github actions.

If I'm unable to get things working in (https://github.com/vega/altair/blob/0ed58e59e312f1af0cc71b72655e4115f229e892/tools/sync_website.py) I'll just revert back to (https://github.com/vega/altair/blob/8a6f65c02e18c9acd3a08e02f73670795c0a7e66/doc/sync_website.sh)

Happy to hold off on making that cross-platform if it becomes a blocker

Update

Resolved in (#3723 (comment))

@dangotbanned
Copy link
Member Author

@binste would really appreciate your approval on this as it closes the issue you raised (#3577)

@mattijn
Copy link
Contributor

mattijn commented Jan 16, 2025

Thanks for all the work here @dangotbanned.
I was hoping this PR would simplify project management for maintenance of the project, but that seems not really yet the case. I've been reading upon this new introduced dependency of taskipy and while I hope it should not be necessary, but for the time being it seems the best way forward, until astral-sh/uv#5903 is resolved.

Another concern I have is the introduction of a lock file. While it’s described as recommended to include it in the repository (https://docs.astral.sh/uv/guides/projects/#uvlock) if you need exact resolvment across OS. Can we avoid this? Why not add it to the .gitignore instead? Our dependencies are already clearly defined in the project.toml file and slight variations in resolution across maintainers shouldn’t be an issue as long as the base requirements are met? I hope we can live without. What do you think?

@dangotbanned
Copy link
Member Author

Hey @mattijn, thanks for taking a look at this PR!

I was hoping this PR would simplify project management for maintenance of the project, but that seems not really yet the case.

I'm sorry to hear you feel that way, but I am hopeful that this is something we can resolve.

Just for my benefit, are these your concerns regarding complexity?

  • The addition of a new dev dependency (taskipy)
  • Introducing a lock file (uv.lock)

I don't quite follow how these are at-odds with simplifying maintenance, could you elaborate on that a bit further for me please?

taskipy

I can understand that adding a new dependency shouldn't be taken lightly.

This package in particular is very minimal and comprised of several small modules (the largest is 163 LOC).
In terms of how this changes maintaining altair; the only difference is how you invoke a command:

hatch run generate-schema-wrapper
uv run task generate-schema-wrapper
hatch test
uv run task test

Personally, I'd prefer something shorter than uv run task ... - but you can always set up an alias in your shell:

hatch run generate-schema-wrapper
# uv run task generate-schema-wrapper
uvt generate-schema-wrapper

I've been reading upon this new introduced dependency of taskipy and while I hope it should not be necessary, but for the time being it seems the best way forward, until astral-sh/uv#5903 is resolved.

I really do believe taskipy is the closest we can get to the simplicity of hatch for this purpose for now.
I'm open to the alternatives - but feel they would be more difficult to move away from after uv adds their own support

uv.lock

Another concern I have is the introduction of a lock file.
...
I hope we can live without.

So there are some benefits to having a lock file that I've come across so far.
I do feel the need though to question why you'd like to avoid one?

https://docs.astral.sh/uv/concepts/projects/sync/

Creating the lockfile

The lockfile is created and updated during uv invocations that use the project environment, i.e., uv sync and uv run. The lockfile may also be explicitly created or updated using uv lock

The file is fully managed by uv and isn't even something you need to think about while working on the project.
If you edit a dependency in pyproject.toml - uv.lock will be updated on the next run only if the current uv.lock doesn't satisfy the constraints. So we still only make changes in one place.

I want to reframe this from

how do we avoid having a lock file?

to

how does a lock file reduce maintenance burden?

Benefits

Consistency between local and CI environment

As an example, today I pushed some commits after passing all tests locally to find ruff warnings appearing in the workflow https://github.com/vega/altair/actions/runs/12810882256/job/35718940621?pr=3631

I'm sure you've run into this issue before - I have quite a few times recently and it really adds up in PRs - all of these are since 5.5.0:

For an active maintainer, it is frustrating to context switch from a PR to fix an unrelated issue that wasn't present during local testing.
For a new contributor, they may not understand they'd only see these warnings locally by reinstalling their environment. My first PR had a CI issue which I definitely didn't know how to fix - but could've been prevented by a lock file.

In either case - having uv.lock eliminates this issue and ensures every environment uses the same versions.
The concern for us is less about OS and more about consistent package versions.

Faster installs without complexity

Given that uv.lock contains a fully resolved environment - we benefit from faster installs both locally and in CI.
Speed is one thing, but you don't need to worry about how - just sync the project and this is all handled for us

altair/CONTRIBUTING.md

Lines 72 to 75 in 3e17cbf

Install the project with all development dependencies:
```cmd
uv sync --all-extras
```

I want to explore this some more in #3725 - where I'm hoping we'll end up with a simpler, faster workflow and be able to reproduce it locally.

I ended up writing quite a bit more (sorry) than I expected on this one 😅
I'm hoping that I was able to illustrate how the switch to uv solves issues beyond those mentioned in #3577

@mattijn
Copy link
Contributor

mattijn commented Jan 16, 2025

I can live with the added taskipy, since it is something temporary.

Regarding the lock file, now every time a subdependency is changing, we will see this related diff coming up in a PR.

We worked hard to reduce the number of files in the root directory for maintenance to almost a single pyproject.toml file.

It's easier to add new files or dependencies than to reduce them, so if we can avoid adding these, that is a win for all.

All in all, I'm not really a big fan of a lock file. We can synchronize to a single unified developer experience, but it won't change the issues users face.
I prefer that we as maintainers experience these issues in environments and during review.

I hadn't seen #3725, that seems interesting.

Let me approve anyways, hopefully our initial statement holds and will the transition towards uv increase the influx of maintainers.

@dangotbanned
Copy link
Member Author

Thanks for approving @mattijn

Regarding the lock file, now every time a subdependency is changing, we will see this related diff coming up in a PR.

Here are some examples of actions leading to a uv.lock diff:

Upgrading a dependency

This one is syncing after a merge

This one was after adding tomli to dev in pyproject.toml

@dangotbanned
Copy link
Member Author

All in all, I'm not really a big fan of a lock file. We can synchronize to a single unified developer experience, but it won't change the issues users face.
I prefer that we as maintainers experience these issues in environments and during review.

@mattijn I've thought some more about this and think I have an idea that gives us the benefits of both

  • GitHub Action scheduled at some interval (e.g. weekly, monthly) that adds a PR upgrading our lock file and running the test suite
  • If our CI is green, it could merge to main without review
  • Otherwise, it should block subsequent runs and require a maintainer to address the issues in that PR to merge

So we get the failures isolated to a single branch (one less concern for new contributors).
But we also continue to know well-ahead of release that we may have a dependency issue (preserving the current feedback loop).


I would split this into another issue to plan out how we'd achieve this.
However, after reading these I feel convinced we have options:

Needed to get in sync with `main`, like 5625e9e

Related #3771
Failure caused by stale resolution following #3764 fix
Upgrading everything so that when this merges it will be equivalent to no `uv.lock` (which is what `main` is now)

https://github.com/vega/altair/actions/runs/12828370500/job/35772253406?pr=3723
@dangotbanned dangotbanned mentioned this pull request Jan 17, 2025
6 tasks
@dangotbanned dangotbanned merged commit 21f5849 into main Jan 17, 2025
25 checks passed
@dangotbanned dangotbanned deleted the migrate-hatch-scripts branch January 17, 2025 12:00
@mattijn
Copy link
Contributor

mattijn commented Jan 17, 2025

I think your suggestion is also a step towards #3574.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch from hatch to uv for package and environment management
2 participants