-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clib: Add virtualfile_to_dataset method for converting virtualfile to a dataset #3083
Conversation
a0d7ed0
to
6827c82
Compare
pygmt/clib/session.py
Outdated
@@ -1738,6 +1738,119 @@ def read_virtualfile( | |||
dtype = {"dataset": _GMT_DATASET, "grid": _GMT_GRID}[kind] | |||
return ctp.cast(pointer, ctp.POINTER(dtype)) | |||
|
|||
def return_table( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method name return_table
was initially proposed in #1318 (comment), but is it a good name? At line 1845, we use kind="dataset"
, so maybe rename it to return_dataset
or output_dataset
? In the future, I think we will add more methods that return a grid/image/cpt/cube, so consistent method names are preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think return_dataset
is better than return_table
. Renamed in ce029b2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about virtualfile_to_dataset
, or vfile_to_dataset
(shorter)? A bit more explicit to say that a conversion is happening from a virtualfile
to a tabular dataset
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
virtualfile_to_dataset
/vfile_to_dataset
sounds a good name. I like vfile_to_dataset
, but do we want to also rename other functions for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean rename the virtualfile_from_*
methods? Let's maybe not do that (lazy to deprecate more names). We can go with virtualfile_to_dataset
for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to rename the virtualfile_in
/virtualfile_out
to vfile_in
/vfile_out
.
virtualfile_from_*
functions are no longer used in our PyGMT wrappers after Universal virtualfile_from_data function to replace virtualfile_from_grid/matrix/vectors #949, so it's OK to keep them unchanged.virtualfile_out
was added in clib: Add the virtualfile_out method for creating output virtualfile #3057, so we can still rename it without deprecations.virtualfile_in
was renamed fromvirtualfile_from_data
recently (PR clib: Rename the virtualfile_from_data method to virtualfile_in #3068), so we can rename it tovfile_in
without more deprecations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep with the long name (virtualfile_to_dataset
). Renaming virtualfile_out
-> vfile_out
and virtualfile_in
-> vfile_in
will be more work and confusing (we'll need to manually update the changelog to track that virtualfile_from_data
became virtualfile_in
which became vfile_in
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 711142c.
@@ -1738,6 +1738,123 @@ def read_virtualfile( | |||
dtype = {"dataset": _GMT_DATASET, "grid": _GMT_GRID}[kind] | |||
return ctp.cast(pointer, ctp.POINTER(dtype)) | |||
|
|||
def return_dataset( | |||
self, | |||
output_type: Literal["pandas", "numpy", "file"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set a default output type here? It looks like we're using pandas
as the default in #3092.
output_type: Literal["pandas", "numpy", "file"], | |
output_type: Literal["pandas", "numpy", "file"] = "pandas", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes no differences because we always call the function with the output_type
parameter, e.g.,:
return lib.return_dataset(
output_type=output_type,
vfile=vouttbl,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it doesn't make any difference in the PyGMT modules, but this is a good central location to document that output_type="pandas"
is the default output (though in #1318, it seemed like most of us were in favour of output_type="input"
or auto as the default).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output_type="input"
or auto
may not make sense for PyGMT, especially in cases like:
- the input data is a file, then
auto
means outputting to a file by default, thenoutfile
is required. - the input data is vectors (e.g.,
x
/y
/z
) and each vector can be a list/ndarray/pd.Series. Then what's the expected format ifauto
/input
is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, not saying that output_type="auto"
would be easy to implement 🙂 I think the default output_type="pandas"
is fine for now since it is an in-memory format that can be converted to virtualfiles relatively easily. We can discuss more about what the ideal output type would be in #1318 (if there is still any debate that needs to be had).
pygmt/clib/session.py
Outdated
@@ -1738,6 +1738,119 @@ def read_virtualfile( | |||
dtype = {"dataset": _GMT_DATASET, "grid": _GMT_GRID}[kind] | |||
return ctp.cast(pointer, ctp.POINTER(dtype)) | |||
|
|||
def return_table( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about virtualfile_to_dataset
, or vfile_to_dataset
(shorter)? A bit more explicit to say that a conversion is happening from a virtualfile
to a tabular dataset
.
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
@@ -294,6 +294,7 @@ conversion of Python variables to GMT virtual files: | |||
clib.Session.virtualfile_from_grid | |||
clib.Session.virtualfile_in | |||
clib.Session.virtualfile_out | |||
clib.Session.virtualfile_to_dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At L287 above, could you change the sentence to read "These methods are context managers that automate the conversion of Python variables to and from GMT virtual files"? Since we can convert GMT virtualfiles to Python objects now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in aee0499.
In commit cd5b31d, I've changed the parameter name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one more small suggestion. Ok to leave out setting output_type="pandas"
as the default setting (https://github.com/GenericMappingTools/pygmt/pull/3083/files#r1518966573) for now.
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Thanks for all the feedbacks. Now marking the PR as "final review call" and will merge it after 24 hours. |
Add the new
Session.virtualfile_to_dataset()
method to simplify the logic of dealing with table outputs in different wrappers. Draft PR #3092 shows how the newSession.virtualfile_to_dataset()
method can simplify other wrappers.