Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clib: Add the virtualfile_out method for creating output virtualfile #3057

Merged
merged 12 commits into from
Feb 28, 2024

Conversation

seisman
Copy link
Member

@seisman seisman commented Feb 20, 2024

Currently, PyGMT has provided several methods for manipulating virtual files. These methods are:

  • open_virtualfile: Open a virtual file and close it when done.
  • virtualfile_from_vectors: Create a virtual file for a table file from 1-D arrays
  • virtualfile_from_matrix: Create a virtual file for a table file from a 2-D array
  • virtualfile_from_grid: Create a virtual file for a grid file from a xarray.DataArray object
  • virtualfile_from_data: Convenience function which wraps virtualfile_from_vectors/virtualfile_from_matrix/virtualfile_from_grid and more
  • read_virtualfile: Read data from a virtual file

All the virtualfile_from_* methods are for input data. This PR provides a new method virtualfile_out which can create a virtual file for output data. This method is necessary to get rid of temporary files for modules that write tables/grids (#2730).

The same codes are already used in PR #2729 and #2398. This PR is created to make those two PRs smaller and easier to review.

Preview: https://pygmt-dev--3057.org.readthedocs.build/en/3057/api/generated/pygmt.clib.Session.virtualfile_out.html

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If wrapping a new module, open a 'Wrap new GMT module' issue and submit reasonably-sized PRs.
  • If adding new functionality, add an example to docstrings or tutorials.
  • Use underscores (not hyphens) in names of Python files and directories.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash command is:

  • /format: automatically format and lint the code

@seisman seisman added the enhancement Improving an existing feature label Feb 20, 2024
@seisman seisman added this to the 0.12.0 milestone Feb 20, 2024
@seisman seisman added needs review This PR has higher priority and needs review. feature Brand new feature and removed enhancement Improving an existing feature labels Feb 20, 2024
@seisman seisman changed the title clib.Session: Add the virtualfile_to_data method for creating virtualfile for output clib.Session: Add the virtualfile_to_data method for creating output virtualfile Feb 20, 2024
@seisman
Copy link
Member Author

seisman commented Feb 21, 2024

We probably need to think more about the function name. Currently, I use virtualfile_to_data, similar to the virtualfile_from_data function. Here, from means input and to means output. virtualfile_from_data sounds like creating a virtual file from an input data, so the function name makes sense, but virtualfile_to_data sounds like converting a virtual file to data, which makes little sense.

Maybe virtualfile_in/virtualfile_out are better names for virtualfile_from_data/virtualfile_to_data? Or even the much shorter version vfile_in/vfile_out?

@seisman seisman requested a review from a team February 25, 2024 03:16
@seisman
Copy link
Member Author

seisman commented Feb 26, 2024

We probably need to think more about the function name. Currently, I use virtualfile_to_data, similar to the virtualfile_from_data function. Here, from means input and to means output. virtualfile_from_data sounds like creating a virtual file from an input data, so the function name makes sense, but virtualfile_to_data sounds like converting a virtual file to data, which makes little sense.

Maybe virtualfile_in/virtualfile_out are better names for virtualfile_from_data/virtualfile_to_data? Or even the much shorter version vfile_in/vfile_out?

Renamed virtualfile_to_data to virtualfile_out in commit a633abf.

virtualfile_from_data will be renamed to virtualfile_in in PR #3068.

@seisman seisman changed the title clib.Session: Add the virtualfile_to_data method for creating output virtualfile clib.Session: Add the virtualfile_out method for creating output virtualfile Feb 26, 2024
@seisman seisman changed the title clib.Session: Add the virtualfile_out method for creating output virtualfile clib: Add the virtualfile_out method for creating output virtualfile Feb 27, 2024
@seisman seisman added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Feb 27, 2024
@seisman
Copy link
Member Author

seisman commented Feb 27, 2024

Ping @GenericMappingTools/pygmt-maintainers for a final review. Otherwise, I'll merge this PR in 24 hours so we can move forward.

Comment on lines 1638 to 1672
Examples
--------
>>> from pathlib import Path
>>> from pygmt.clib import Session
>>> from pygmt.datatypes import _GMT_DATASET, _GMT_GRID
>>> from pygmt.helpers import GMTTempFile
>>>
>>> # Create a virtual file for storing the output table.
>>> with GMTTempFile(suffix=".txt") as tmpfile:
... with open(tmpfile.name, mode="w") as fp:
... print("1.0 2.0 3.0 TEXT", file=fp)
... with Session() as lib:
... with lib.virtualfile_out(kind="dataset") as vouttbl:
... lib.call_module("read", f"{tmpfile.name} {vouttbl} -Td")
... ds = lib.read_virtualfile(vouttbl, kind="dataset")
>>> isinstance(ds.contents, _GMT_DATASET)
True
>>>
>>> # Create a virtual file for storing the output grid.
>>> with Session() as lib:
... with lib.virtualfile_out(kind="grid") as voutgrd:
... lib.call_module("read", f"@earth_relief_01d_g {voutgrd} -Tg")
... outgrd = lib.read_virtualfile(voutgrd, kind="grid")
>>> isinstance(outgrd.contents, _GMT_GRID)
True
>>>
>>> # Write data to file without creating a virtual file
>>> with GMTTempFile(suffix=".nc") as tmpfile:
... with Session() as lib:
... with lib.virtualfile_out(
... kind="grid", fname=tmpfile.name
... ) as voutgrd:
... lib.call_module("read", f"@earth_relief_01d_g {voutgrd} -Tg")
... assert voutgrd == tmpfile.name
... assert Path(voutgrd).stat().st_size > 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this into a new test_clib_virtualfile_out.py file? Two reasons:

  1. There might be other data types like GMT_IMAGE and GMT_CUBE in the future, and the doctest would get long.
  2. Might want to test the output of the virtualfiles more properly using pygmt.info or pygmt.grdinfo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this into a new test_clib_virtualfile_out.py file? Two reasons:

  1. There might be other data types like GMT_IMAGE and GMT_CUBE in the future, and the doctest would get long.

The main purpose of these doctests are to show how to use this method and the outpu virtual file in other modules. So I think we can only keep the dataset one only and remove all others.

  1. Might want to test the output of the virtualfiles more properly using pygmt.info or pygmt.grdinfo.

Actually we can't. Because the virtual file is an output (i.e., GMT_OUT), it can't be passed as an input to info/grdinfo. I.e., lib.call_module("info", f"{vouttbl}") won't work.

Copy link
Member

@weiji14 weiji14 Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think we can only keep the dataset one only and remove all others.

You can move the grid one to test_clib_virtualfile_out.py

3. Might want to test the output of the virtualfiles more properly using pygmt.info or pygmt.grdinfo.

Actually we can't. Because the virtual file is an output (i.e., GMT_OUT), it can't be passed as an input to info/grdinfo. I.e., lib.call_module("info", f"{vouttbl}") won't work.

Ah ok, maybe not for the virtualfile then. But we should test that saving to a (non-virtual) file works (when fname is used), and try loading from there (i.e. roundtrip). Again, can put this in test_clib_virtualfile_out.py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this into a new test_clib_virtualfile_out.py file? Two reasons:

  1. There might be other data types like GMT_IMAGE and GMT_CUBE in the future, and the doctest would get long.

The main purpose of these doctests are to show how to use this method and the outpu virtual file in other modules. So I think we can only keep the dataset one only and remove all others.

In commit 9108917, I've removed the doctest for grid and kept the doctests for dataset and fname. I also realized that these doctests are almost the same as the ones in read_virtualfile. This makes sense, because read_virtualfile was implemented to read the data from an output virtualfile. So, I also simplify the doctests in read_virtualfile using the newly added virtualfile_out method.

In other words, we probably need to move the doctests of read_virtualfile into a separate test file in the future.

Copy link
Member Author

@seisman seisman Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think we can only keep the dataset one only and remove all others.

You can move the grid one to test_clib_virtualfile_out.py

As mentioned in the above comment, the grid one is already doctested in the read_virtualfile method.

  1. Might want to test the output of the virtualfiles more properly using pygmt.info or pygmt.grdinfo.

Actually we can't. Because the virtual file is an output (i.e., GMT_OUT), it can't be passed as an input to info/grdinfo. I.e., lib.call_module("info", f"{vouttbl}") won't work.

Ah ok, maybe not for the virtualfile then. But we should test that saving to a (non-virtual) file works (when fname is used), and try loading from there (i.e. roundtrip). Again, can put this in test_clib_virtualfile_out.py

Done in 888f1c9 but not in a separate test file.

@seisman seisman merged commit 5014591 into main Feb 28, 2024
18 of 19 checks passed
@seisman seisman deleted the virtualfile_to_data branch February 28, 2024 04:34
@seisman seisman removed the final review call This PR requires final review and approval from a second reviewer label Feb 28, 2024
@seisman seisman added enhancement Improving an existing feature and removed feature Brand new feature labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improving an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants