Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pygmt.grdcut: Refactor to store output in virtualfiles for grids #3115

Draft
wants to merge 86 commits into
base: main
Choose a base branch
from

Conversation

seisman
Copy link
Member

@seisman seisman commented Mar 17, 2024

This PR tries to get rid of temporary files from pygmt.grdcut (#2730).

The GMT_GRID wrapper (PR #2398) can store output 2-D grid in virtual files, but doesn't work for output images. PR #3128 will wrap GMT_IMAGE to store output images in virtual files, but it's still WIP.

This PR makes necessary changes to make pygmt.grdcut work for both grids and images. For images, we still use temporary files but it should be refactored again after PR #3128.

seisman and others added 2 commits March 20, 2024 11:00
pygmt/src/grdcut.py Outdated Show resolved Hide resolved
pygmt/clib/session.py Outdated Show resolved Hide resolved
pygmt/src/grdcut.py Outdated Show resolved Hide resolved
@seisman seisman changed the title WIP: Initial try to support raster images using temporary files WIP: pygmt.grdcut: Support both grids and images Mar 27, 2024
@seisman seisman force-pushed the datatypes/gmtgrid branch 3 times, most recently from d8f1160 to d0517e0 Compare April 1, 2024 07:11
@seisman seisman changed the base branch from datatypes/gmtgrid to main April 1, 2024 11:32
@seisman seisman added the enhancement Improving an existing feature label Apr 19, 2024
weiji14 added a commit that referenced this pull request Jun 18, 2024
Special case `earth_day` to be loaded as GMT_IMAGE rather than GMT_GRID. Need to use `GMT_OUT|GMT_IS_REFERENCE` in virtualfile_out to avoid segfault, xref #3115.
Extra metadata from the _GMT_GRID_HEADER struct.
Reorder the dimensions to follow Channel, Height, Width (CHW) convention. Also added doctest checking output DataArray object and the image's x and y coordinates.
Get the registration and gtype info from the grid header and apply it to the GMT accessor attributes.
Trying to match some of the doctests in _GMT_GRID.
pygmt/clib/session.py Outdated Show resolved Hide resolved
pygmt/clib/session.py Outdated Show resolved Hide resolved
pygmt/src/grdcut.py Outdated Show resolved Hide resolved
pygmt/src/grdcut.py Outdated Show resolved Hide resolved
pygmt/clib/session.py Outdated Show resolved Hide resolved
pygmt/src/grdcut.py Outdated Show resolved Hide resolved
pygmt/src/grdcut.py Outdated Show resolved Hide resolved
Comment on lines +2330 to +2339
try:
img = lib.read_data(infile=raster, kind="image", mode="GMT_CONTAINER_ONLY")
return "image" if img.contents.header.contents.n_bands == 3 else "grid"
except GMTCLibError:
pass
try:
_ = lib.read_data(infile=raster, kind="grid", mode="GMT_CONTAINER_ONLY")
return "grid"
except GMTCLibError:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double-checking that there's not a function in GMT C already to determine grid/image type? I know there's GMT_Inquire_VirtualFile for virtualfiles, but is there not one for just files? I saw this GMT_Get_Info, but unsure if it'll be usable here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/GenericMappingTools/gmt/blob/7809736ba32d87a4a96b15444419eb176c6a35f3/src/gmt_grdio.c#L3377

gmt_raster_type is the function that GMT uses to determine the raster type, but it's not a public API function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oo, magic byte parsing 😆 If that function becomes public someday, maybe we'll use it.

@@ -2311,3 +2312,29 @@ def extract_region(self) -> np.ndarray:
if status != 0:
raise GMTCLibError("Failed to extract region from current figure.")
return region


def _raster_kind(raster: str) -> Literal["grid", "image"] | None:
Copy link
Member

@weiji14 weiji14 Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mypy is complaining about the output type hint here:

pygmt/src/grdcut.py:108: error: Incompatible types in assignment (expression has type "Literal['grid', 'image'] | None", variable has type "Literal['grid', 'image']")  [assignment]

Maybe we should just use Literal['grid', 'image'] and raise GMTInvalidInput if it is not a grid/image instead of returning None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe falling back to "grid" and then let GMT itself decide if it can process it or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, with a binary image/grid output, maybe just make a function like def _is_image() -> bool?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grdcut can also cut a cube, so the _raster_kind function should return grid/image/cube if possible. Currently, the' cube' support is not yet finished (PR #3150). So, the name _raster_kind makes more sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, and then there would be 3 try-except statements. Now wishing we could re-use the function from GMT C...

Base automatically changed from datatypes/gmtimage to main September 29, 2024 08:11
Comment on lines +2332 to +2334
# The logic here is because: an image can be read into a grid container, but a grid
# can't be read into an image container. So, try to read the file as an image first.
# If fails, try to read it as a grid.
Copy link
Member Author

@seisman seisman Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out the logic here is not 100% correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • A grid can be read into an image container, as long as another libgdal package (maybe libgdal-netcdf or libgdal-hdf5) is installed. xref:

The first try-except check is to try reading into a GMT_IMAGE struct, and return "image" if there are 3 bands, and "grid" if there is only 1 band. Even if grids can be read into GMT_IMAGE, we will still call it a grid if it has 1-band right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to refresh my memory about what's happening:

  • Reading a grid into GMT_GRID: Works!
  • Reading an image into GMT_GRID: Only the first band is read.
  • Reading a grid into GMT_IMAGE: It depends on libgdal-hdf5 being installed. Read as an single-band image if installed, otherwise raise errors.
  • Reading an image into GMT_IMAGE: works.
import pygmt
pygmt.grdcut("@earth_relief_01d_p", region=[0, 10, 0, 10])

If libgdal-hdf5 is not installed, _raster_kind first read it as an image, which fails and reports following errors:

ERROR 4: `/home/runner/.gmt/server/earth/earth_relief/earth_relief_01d_p.grd' not recognized as being in a supported file format. It could have been recognized by driver HDF5, but plugin gdal_HDF5.so is not available in your installation. You may install it with 'conda install -c conda-forge libgdal-hdf5'
Error: ession [ERROR]: Unable to open earth_relief_01d_p.grd.
Error: ession [ERROR]: ERROR reading image with gdalread.
pygmt-session (gmtapi_import_image): Could not read from file [earth_relief_01d_p.grd]
[Session pygmt-session (43)]: Error returned from GMT API: GMT_IMAGE_READ_ERROR (22)
[Session pygmt-session (43)]: Error returned from GMT API: GMT_IMAGE_READ_ERROR (22)

then it tries to read it as a grid, which succeeds. But we still see the error messages above. I believe we need to find a way to suppress these errors or find an alternative way to determine the data kind.

@seisman seisman removed run/benchmark Trigger the benchmark workflow in PRs run/test-gmt-dev Trigger the GMT Dev Tests workflow in PR labels Sep 30, 2024
@seisman seisman self-assigned this Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improving an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants