`Usage:format`: per-usage format spec #54

PeterKraus · 2024-03-12T08:08:19Z

Sparked by marda-alliance/metadata_extractors_registry#78 (comment)

It might be useful to have a mechanism to indicate what package/library needs to be present in the "caller" environment in order to understand the format of the objects returned in-memory.

Currently, we only have an install target of [formats] in the API, which installs xarray and pandas into the parent environment. However, if the required library is not present in the parent environment, the unpickling of the shared memory object will fail. We can annotate what's required (should be a single library per usage, in my opinion) here, and then modify the API to use this data.

See the Extractor-datatree.yaml example file to see what I mean in more detail.

for more information, see https://pre-commit.ci

ml-evs

I think this one needs a bit more discussion tbh. For this field to be truly machine-actionable it somehow also needs to have the same granularity as the install instructions (in my mind), which also feels like overkill... I get the idea that each extractor command may have a different "format", but I was imagining something more like an additional config in the install metadata that specifies any arbitrary output_requirements or somesuch. This is all pretty awkward anyway, as the point of the API is that each extractor is isolated, so if we're allowing extractors to mark that they need arbitrary (/conflicting) Python reqs installed in the top-level executing environment then we're kinda scuppered.

This leaves us with the current option, that there are a set of "blessed" packages, e.g., pandas, xarray, (perhaps nexus) and then of course any generic Python objects, that are "supported" in this mode, in which case this field does not need to be machine-actionable but can provide useful info to a user not using the reference implementation.

What do you think?

PeterKraus · 2024-03-19T19:07:11Z

I expected this to need some thought.

I don't want to implement another package manager (hence the npm story I shared with you). The idea for this one was really to provide a way to:

add some metadata on what's required downstream of the tractor to "understand" what's coming in memory from upstream
search the yard, for instance if galvani needs just a pandas but yadg needs xarray ...
be able to provide a better hint in tractor beam about what's gone wrong when the unpickling fails

I am also not 100% happy with calling it "format" and the description could be improved to clarify the above. Would you be OK with that?

PeterKraus · 2024-04-09T14:48:42Z

On today's meeting, we've decided that it's reasonable to expect the in-memory returned object to be either a native python object, or something that is understood by pandas or xarray (i.e. the current content of [formats] which ought to be installed by default, see marda-alliance/metadata_extractors_api#33).

We hope to implement a proper output spec once more packages are in the registry. Closing.

Format.

db0750f

PeterKraus requested review from ml-evs and davidelbert as code owners March 12, 2024 08:08

pre-commit-ci bot and others added 2 commits March 12, 2024 08:08

[pre-commit.ci] auto fixes from pre-commit.com hooks

95a1089

for more information, see https://pre-commit.ci

Fix formatting.

2c07c0f

ml-evs reviewed Mar 19, 2024

View reviewed changes

PeterKraus closed this Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Usage:format`: per-usage format spec #54

`Usage:format`: per-usage format spec #54

PeterKraus commented Mar 12, 2024 •

edited

Loading

ml-evs left a comment

PeterKraus commented Mar 19, 2024

PeterKraus commented Apr 9, 2024

Usage:format: per-usage format spec #54

Usage:format: per-usage format spec #54

Conversation

PeterKraus commented Mar 12, 2024 • edited Loading

ml-evs left a comment

Choose a reason for hiding this comment

PeterKraus commented Mar 19, 2024

PeterKraus commented Apr 9, 2024

`Usage:format`: per-usage format spec #54

`Usage:format`: per-usage format spec #54

PeterKraus commented Mar 12, 2024 •

edited

Loading