Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional python packaging documentation, version correction #9405

Merged
merged 3 commits into from
Mar 16, 2024

Conversation

vilhelmen
Copy link
Contributor

@vilhelmen vilhelmen commented Mar 7, 2024

What does this PR do?

Updates docs on building numpy support. I've been maintaining this SO answer on how to fix it for a year now and I just ran into a new problem installing the numpy bindings. I have tested it and wheel and setuptools cannot be installed alongside gdal[numpy], it must be done before.

There is also a tweak to the version of setuptools in the pyproject.toml I determined through experimentation. Pip, or setuptools, or whatever completely ignores this string (happens in my packages, too), but at least it's accurate now. On its own, pip will think >=48 is good enough, which is incredibly wrong for multiple reasons. I'm unaware of a fix, I had the issue back in setuptools 64.

What are related issues/pull requests?

None

Tasklist

  • Review
  • Adjust for comments
  • All CI builds and checks have passed

Environment

ghcr.io/osgeo/gdal:ubuntu-full-3.8.3 with python 3.12.1
Gdal 3.8.4 via homebrew and Python 3.12.2 on Mac

@coveralls
Copy link
Collaborator

coveralls commented Mar 7, 2024

Coverage Status

coverage: 68.866% (+0.03%) from 68.833%
when pulling eff0fef on vilhelmen:python_packaging
into fd0c4f4 on OSGeo:master.

@pl-kevinwurster
Copy link

pl-kevinwurster commented Mar 7, 2024

@rouault I downloaded the source distribution (direct link to .tar.gz) of the latest GDAL Python package release, and I notice it does not contain a pyproject.toml, and the setup.py does not have the GDAL_PYTHON_BINDINGS_WITHOUT_NUMPY changes introduced in #8926. Is this expected? I don't fully understand the GDAL release cycle, but the 3.8.4 Python package was released on 2024-02-18, so I was expeecting to see the changes from #8926.


@vilhelmen I have owed the project some additional documentation on the topic of $ pip install gdal for quite some time – thanks for doing this.

I think your changes need some additional detail, which I have so far failed to provide to the project. As noted above, I'm surprised that the PyPI package does not reflect these changes yet, so I have not verified my suggestions.

The gdal Python package provides bindings for OGR (vector via osgeo.ogr), GDAL (raster via osgeo.ogr), and OSR (coordinate transformations via osgeo.osr). OSR has no additional dependencies, so we can ignore it. The raster side (osgeo.gdal) has a dependency on Numpy, but the vector side (osgeo.ogr) does not. Users can install the gdal package in a mode where it enables support for only vector, or both raster and vector. As far as I know, OSR support is always enabled.

The gdal package does not provide a wheel build – only a source distribution. Currently, users are always building from source. Numpy is sometimes a build requirement, which means it cannot be listed in setup_requires because Python's build/install tooling does not provide a mechanism for altering setup_requires at build or install time. #8926 made the build process default to requiring Numpy, however users only wanting OGR have the option to disable this requirement via the GDAL_PYTHON_BINDINGS_WITHOUT_NUMPY environment variable.

The commands below should possibly use the --no-binary gdal option as well, but I don't fully understand the implications of building a wheel at install-time for a complicated source distribution like GDAL.

Users wanting to install gdal with only vector support should be able to use:

$ GDAL_PYTHON_BINDINGS_WITHOUT_NUMPY= pip install "gdal"

Users wanting to install gdal with raster support must ensure that Numpy is installed before attempting to build/install gdal. Users should then also specify the numpy "extras" to let pip double check that the dependency tree is satisfied. As far as I can tell, it is not possible to install both packages in one $ pip install invocation. pip does not guarantee package install order, and does not provide a mechanism for injecting a package into some other package's build environment.

$ pip install numpy
$ pip install "gdal[numpy]"

Users can verify that raster support has been installed with:

$ python3 -c "from osgeo import gdal_array"

Vector support should always be enabled.

It is also possible for users wanting raster support to build a version of gdal locally without Numpy, and have it be cached by pip. For example, these commands:

# Assume Numpy is already installed

# No vector support
$ pip install gdal

# Oops I need vector support. Let me unisntall.
$ pip uninstall gdal

# And reinstall. This picks up the previously built version of 'gdal' that was built without vector support.
$ pip install numpy
$ pip install "gdal[numpy]"

will still result in an install that does not have raster support. To get out of this situation, users must instruct pip to not use its cache:

$ pip install gdal

# Users can use this command
$ pip install gdal --force-reinstall --no-cache

# Or they can explicitly uninstall and reinstall GDAL - the '--no-cache' flag is _very_ important.
$ pip uninstall gdal
$ pip install gdal --no-cache

A major downside to this setup is that any package depending on gdal must instruct its users to pre-install numpy before installing the package they are interested in, because that package in turn depends on GDAL. This is an issue we were grappling with in isofit/isofit#420. ISOFIT has since removed its dependency on the gdal Python package, but install instructions would have looked like:

$ pip install numpy
$ pip install isofit

In an ideal world users would be able to explicitly define which portions of GDAL they want with extras_require, but the Python packaging toolchain does not seem to have the necessary features for this:

$ pip install gdal[ogr,gdal]

These problems are really hard to solve, and any solution would require the GDAL maintainers to take on an even larger burden than the current setup.


::

pip install wheel setuptools>=67

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For at least newer (very new?) versions of $ pip, I don't think wheel has to be explicitly installed. I think it is automatically installed as part of the build process?

Looking at the setuptools changelog, it isn't really clear to me what feature in v67 is required for installing GDAL. Maybe this?

#3790: Bump vendored version of packaging to 23.0 (pyparsing is no longer required and was removed). As a consequence, users will experience a more strict parsing of requirements. Specifications that don’t comply with PEP 440 and PEP 508 will result in build errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uninstalled wheel and confirmed it's still necessary as of (seemingly) latest pip 24.0:

  Running command Getting requirements to build wheel
  WARNING: numpy not available!  Array support will not be enabled

I'm not up to date with the bleeding edge of packaging, but I was still under the impression that wheel was necessary for anything compiled/non-sdist.

I think the venvs based off system python are secretly finding their way to wheel from the system install, which is why it makes it look like setuptools and wheel aren't needed, but I haven't done any tracing to verify.

All I can say for setuptools is that I tired setuptools==64 through setuptools==69 and 67.0 made it work.

FWIW, here's a trimmed down Dockerfile of what I'm working with:

ARG BASE_CONTAINER=ghcr.io/osgeo/gdal:ubuntu-full-3.8.4
FROM $BASE_CONTAINER

# Base tools and gdal stuff
RUN export DEBIAN_FRONTEND=noninteractive \
  && apt-get update && apt-get install -y \
    ca-certificates \
    nano tmux htop less git jq wget \
    gcc g++ make cmake \
    software-properties-common \
  && apt-add-repository -yn ppa:deadsnakes/ppa \
  && apt-get purge --autoremove -y software-properties-common \
  && echo 'UGFja2FnZTogKgpQaW46IHJlbGVhc2Ugbz1MUC1QUEEtZGVhZHNuYWtlcwpQaW4tUHJpb3JpdHk6IDYwMAo=' | base64 -d | tee /etc/apt/preferences.d/snakes_prefer \
  && apt-get update \
  && apt-get install -y python3.12 python3.12-dev python3.12-distutils python3.12-venv \
  && update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 10 \
  && python3.12 -m ensurepip --upgrade --default-pip \
  && apt-get install -y geotiff-bin \
  && rm -rf /var/lib/apt/lists/*

# NO GDAL YOU'RE BAD. Reinstall it! May need to blast numpy tbh.
RUN pip3 install --no-cache --upgrade 'setuptools>=67' 'pip>=23' wheel 'numpy>=1.24' \
  && pip3 uninstall -y GDAL \
  && pip3 install --no-cache --no-cache-dir --upgrade 'GDAL[numpy]=='"$(gdal-config --version)"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not up to date with the bleeding edge of packaging, but I was still under the impression that wheel was necessary for anything compiled/non-sdist.

Historically I don't think this is the case, but I'm more than willing to be wrong.All of the different configurations in the current packaging world are super confusing.

I looked through the pip source code, and this message:

Running command Getting requirements to build wheel

means that pip is installing wheel in the build environment. I couldn't figure out exactly how, though.

Doing some more experimenting, it does seem like wheel is required to install the gdal source distribution. I would have expected the version of wheel that pip seems to install in the build environment to satisfy this requirement, but I guess not.

So, yes, it seems like your install command here is correct.

@rouault
Copy link
Member

rouault commented Mar 7, 2024

I downloaded the source distribution (direct link to .tar.gz) of the latest GDAL Python package release, and I notice it does not contain a pyproject.toml, and the setup.py does not have the GDAL_PYTHON_BINDINGS_WITHOUT_NUMPY changes introduced in #8926. Is this expected?

yes, those changes have been done only on the master / 3.9.0dev branch. The 3.8.x point releases are done from a separate maintenance branch, and I didn't feel #8926 was appropriate for backport

@pl-kevinwurster
Copy link

pl-kevinwurster commented Mar 7, 2024

@rouault Ah thanks for the context!


@vilhelmen in that case, my general explanation is the same, but users installing gdal today will need slightly different commands.

Users wanting only vector support should use:

$ pip install gdal

and users wanting raster + vector support should use:

$ pip install numpy
$ pip install gdal

Users wanting raster support can still verify their gdal install with:

$ python3 -c "from osgeo import gdal_array"

I updated my previous comment with a note about how to get out of a situation where pip has cached a version of gdal without raster support, and how to fix it.

Throw out section in Unix because it doesn't really say anything

Ty for the notes @pl-kevinwurster
@vilhelmen
Copy link
Contributor Author

vilhelmen commented Mar 9, 2024

@pl-kevinwurster Thanks for the additional notes. I've lost many hours to python packaging and pip caching, so I feel your pain.

I've fleshed out the pip instructions so they should actually be useful to people now (thoughts and prayers to Conda users). I've only eyeballed the RST in my text editor so I'm not 100% on the formatting. Is there an easy way to check it from CI or point the docs page to this commit? Looks like the CI HTML zip artifact doesn't contain the new page?

Out of scope for this PR, and I know it would be a pain to verify, but how certain are we on numpy 1.0.0?

@pl-kevinwurster
Copy link

Out of scope for this PR, and I know it would be a pain to verify, but how certain are we on numpy 1.0.0?

At this point the gdal python package specifies python_requires='>=3.8.0', I would expect versions of Numpy compatible with Python 3.8 to also set an appropriate python_requires.

Have you seen people trying to install gdal with very old versions of Numpy? I would mostly expect the Numpy dependency to "just work" at this point.

@vilhelmen
Copy link
Contributor Author

You've got a good point. If someone is working out of a hard 1.0.0 requirement, they should already know they're in for a bad time.

Just wanted to bring it up since we're touching the file.

@pl-kevinwurster
Copy link

@vilhelmen Numpy 1.0 was released 18 years ago. The gdal Python package is tested using the oldest-supported-numpy package, so for each version of Python, the oldest version of Numpy supported according to python_requires is used for testing. IMO this is enough testing, and anyone attempting to force an installation with a version of Numpy that is incompatible with a given version of Python, or the gdal Python package itself, is on their own.

There are some differences based on CPU architecture, but generally the oldest version of Numpy compatible with Python 3.8 is 1.17.3. I have no idea what, if any, version of the gdal Python packge would work with a version of Numpy that old. Possibly things would work just fine given how GDAL actually uses Numpy, or possibly they wouldn't work at all!!

@rouault
Copy link
Member

rouault commented Mar 13, 2024

The gdal Python package is tested using the oldest-supported-numpy

Given how we build the bindings, I've some doubts our CI actually triggers this new pyproject.toml requires. But it would be safe to indicate that as the minimum version.

Possibly things would work just fine given how GDAL actually uses Numpy, or possibly they wouldn't work at all!!

The gdal_array module makes very primitive use of Numpy, so it could potentially work with very ancient versions (and actually I tested a couple weeks ago against numpy 2.0dev and things seemed to work fine too), but yes hard to be sure how old as people in practice tests against quite recent one
I believe the oldest tested by our CI is the python3-numpy 1.17 package of Ubuntu 20.04


To install with numpy support, you need to require the optional numpy component:
In order to enable raster support, libgdal and its development headers must be installed as well as the Python packages numpy, setuptools, and wheel.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"raster support". Well GDAL without numpy does support raster, it is just that it is not very convenient if you need to inspect array values, but you can for example script raster conversion with gdal.Translate() without requiring numpy. Maybe use "numpy-based raster support" ? (here and below). The sentence "The base GDAL package contains support for OGR, OSR, and GDAL vectors:" above should also be reworked, as it gives the impression that there's no raster support at all

pip install gdal[numpy]=="$(gdal-config --version).*"


Users can verify that reaster support has been installed with:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Users can verify that reaster support has been installed with:
Users can verify that raster support has been installed with:

@rouault rouault merged commit 49b95c1 into OSGeo:master Mar 16, 2024
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants