-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-48047 : Refactor sync calls in async functions #157
Conversation
ab12e02
to
f22e198
Compare
126ec66
to
b701c85
Compare
cc3d442
to
016065a
Compare
10cf9c5
to
656d95e
Compare
64543d7
to
429494f
Compare
429494f
to
788ab1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Really glad you were able to carry it over the finish line (and happy to see my changes in here too)! I had a few questions, + some comments on the butler functions and their usage that I'd be happy to discuss more with you. Should be good to go!
with open(stamp_url, "w", encoding="utf-8") as fstamp: | ||
fields = dict(status=StatusEnum.reviewable.name) | ||
yaml.dump(fields, fstamp) | ||
yaml_output = yaml.dump(dict(status=StatusEnum.reviewable.name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we determine that synchronous yaml.dump
is okay? Assuming so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike the similarly named json.dump
, yaml.dump
is not a file operation, so it's more alike to json.dumps
where the object is serialized to a string; it is a memory operation.
It is the next line that writes the yaml-formatted string to a file.
That said, the yaml.dump
function does support file operations when an open file is given as an argument, so you can use it both ways.
|
||
async def submit_file_to_run_in_bash(script_url: str | Path, log_url: str | pathlib.Path) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve of the more useful title.
Prefix to script_url used when rolling back | ||
processing | ||
processing. Will default to CWD ("."). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[What does CWD stand for? If totally standard, no need to change the comment- but Google is unhelpful]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current Working Directory.
@@ -25,20 +29,27 @@ def remove_run_collections( | |||
Allow for missing butler | |||
""" | |||
try: | |||
butler = Butler.from_config(butler_repo, collections=[collection_name], without_datastore=True) | |||
butler_f = partial( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today I learned about the functools partial
- I will be using this in my own code!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's one of those deep cut standard library functions that you don't know you need until you learn about it.
except MissingCollectionError: | ||
pass | ||
except Exception as msg: | ||
raise errors.CMButlerCallError(msg) from msg | ||
|
||
|
||
def remove_non_run_collections( | ||
# FIXME how is this different to `remove_run_collections`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to suggest a different function name to address this comment. This deals with butler remove-collections
vs. butler remove-runs
(a mostly understood distinction by users of the Butler that our code should also reflect).
This entry in the Middleware FAQ details best practices here (and I notice that it does not use remove-runs
as we do. We might want to look into following the directions in this how-to instead/why we use the less safe remove-runs
method). Here is the documentation on removeRuns and here is the documentation on removeCollection. Hope this helps!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will create a story to follow up on this Butler stuff, or add it to a related story.
except Exception as msg: | ||
raise errors.CMButlerCallError(msg) from msg | ||
|
||
|
||
def remove_collection_from_chain( # pylint: disable=unused-argument | ||
async def remove_collection_from_chain( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the scope of this PR, but see above comment and Middleware FAQ for how we could do this in future.
@@ -97,7 +114,7 @@ def remove_collection_from_chain( # pylint: disable=unused-argument | |||
raise NotImplementedError | |||
|
|||
|
|||
def remove_datasets_from_collections( # pylint: disable=unused-argument | |||
async def remove_datasets_from_collections( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as previous.
with open(full_file_path, encoding="utf-8") as fin: | ||
data = yaml.safe_load(fin) | ||
|
||
full_file_path = await Path(os.path.expandvars(file_path)).resolve() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, does wrapping this in Path
make it play nicely with await
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the fancy async-aware anyio.Path
, though, not the normal pathlib.Path
. So any Path operations that could be blocking are now await-able.
They have a handy list of such methods on their doc site.
Uses asynchronous functions in place of synchronous functions in several places, notably:
os.path
operations are replaced byanyio.Path
operations. This is exclusive of theos.path.expandvars()
method for which nopathlib
analogue exists, but it is not an IO blocking operation.subprocess
operations are replaced byanyio.open_process
operations, and the parsing of process output (stdout/stderr) is refactored usinganyio
stream wrappers.anyio.to_thread.run_sync
.