Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-48047 : Refactor sync calls in async functions #157

Merged
merged 7 commits into from
Jan 15, 2025
Merged

Conversation

tcjennings
Copy link
Contributor

@tcjennings tcjennings commented Jan 13, 2025

Uses asynchronous functions in place of synchronous functions in several places, notably:

  • enables the flake8 ASYNC rules in ruff linting. This helps identify when blocking calls are made from asynchronous functions, but it is probably not going to catch everything.
  • os.path operations are replaced by anyio.Path operations. This is exclusive of the os.path.expandvars() method for which no pathlib analogue exists, but it is not an IO blocking operation.
  • subprocess operations are replaced by anyio.open_process operations, and the parsing of process output (stdout/stderr) is refactored using anyio stream wrappers.
  • Butler API calls are wrapped with anyio.to_thread.run_sync.

@tcjennings tcjennings force-pushed the tickets/DM-48047 branch 2 times, most recently from ab12e02 to f22e198 Compare January 13, 2025 18:55
@tcjennings tcjennings changed the base branch from main to tickets/DM-48099/v0.2.0 January 13, 2025 18:56
@tcjennings tcjennings force-pushed the tickets/DM-48047 branch 3 times, most recently from 126ec66 to b701c85 Compare January 13, 2025 22:03
@tcjennings tcjennings force-pushed the tickets/DM-48099/v0.2.0 branch from cc3d442 to 016065a Compare January 14, 2025 14:46
@tcjennings tcjennings changed the base branch from tickets/DM-48099/v0.2.0 to main January 14, 2025 14:49
@tcjennings tcjennings force-pushed the tickets/DM-48047 branch 4 times, most recently from 10cf9c5 to 656d95e Compare January 14, 2025 15:48
@tcjennings tcjennings marked this pull request as ready for review January 14, 2025 16:15
@tcjennings tcjennings force-pushed the tickets/DM-48047 branch 3 times, most recently from 64543d7 to 429494f Compare January 15, 2025 14:38
Copy link
Contributor

@eigerx eigerx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Really glad you were able to carry it over the finish line (and happy to see my changes in here too)! I had a few questions, + some comments on the butler functions and their usage that I'd be happy to discuss more with you. Should be good to go!

src/lsst/cmservice/common/bash.py Outdated Show resolved Hide resolved
with open(stamp_url, "w", encoding="utf-8") as fstamp:
fields = dict(status=StatusEnum.reviewable.name)
yaml.dump(fields, fstamp)
yaml_output = yaml.dump(dict(status=StatusEnum.reviewable.name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we determine that synchronous yaml.dump is okay? Assuming so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike the similarly named json.dump, yaml.dump is not a file operation, so it's more alike to json.dumps where the object is serialized to a string; it is a memory operation.

It is the next line that writes the yaml-formatted string to a file.

That said, the yaml.dump function does support file operations when an open file is given as an argument, so you can use it both ways.


async def submit_file_to_run_in_bash(script_url: str | Path, log_url: str | pathlib.Path) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve of the more useful title.

Prefix to script_url used when rolling back
processing
processing. Will default to CWD (".").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[What does CWD stand for? If totally standard, no need to change the comment- but Google is unhelpful]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current Working Directory.

@@ -25,20 +29,27 @@ def remove_run_collections(
Allow for missing butler
"""
try:
butler = Butler.from_config(butler_repo, collections=[collection_name], without_datastore=True)
butler_f = partial(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today I learned about the functools partial - I will be using this in my own code!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's one of those deep cut standard library functions that you don't know you need until you learn about it.

except MissingCollectionError:
pass
except Exception as msg:
raise errors.CMButlerCallError(msg) from msg


def remove_non_run_collections(
# FIXME how is this different to `remove_run_collections`?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to suggest a different function name to address this comment. This deals with butler remove-collections vs. butler remove-runs(a mostly understood distinction by users of the Butler that our code should also reflect).

This entry in the Middleware FAQ details best practices here (and I notice that it does not use remove-runs as we do. We might want to look into following the directions in this how-to instead/why we use the less safe remove-runs method). Here is the documentation on removeRuns and here is the documentation on removeCollection. Hope this helps!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a story to follow up on this Butler stuff, or add it to a related story.

except Exception as msg:
raise errors.CMButlerCallError(msg) from msg


def remove_collection_from_chain( # pylint: disable=unused-argument
async def remove_collection_from_chain(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the scope of this PR, but see above comment and Middleware FAQ for how we could do this in future.

@@ -97,7 +114,7 @@ def remove_collection_from_chain( # pylint: disable=unused-argument
raise NotImplementedError


def remove_datasets_from_collections( # pylint: disable=unused-argument
async def remove_datasets_from_collections(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as previous.

with open(full_file_path, encoding="utf-8") as fin:
data = yaml.safe_load(fin)

full_file_path = await Path(os.path.expandvars(file_path)).resolve()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, does wrapping this in Path make it play nicely with await ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the fancy async-aware anyio.Path, though, not the normal pathlib.Path. So any Path operations that could be blocking are now await-able.

They have a handy list of such methods on their doc site.

@tcjennings tcjennings merged commit 8fa552e into main Jan 15, 2025
8 checks passed
@tcjennings tcjennings deleted the tickets/DM-48047 branch January 15, 2025 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants