DM-48047 : Refactor sync calls in async functions #157

tcjennings · 2025-01-13T15:21:21Z

Uses asynchronous functions in place of synchronous functions in several places, notably:

enables the flake8 ASYNC rules in ruff linting. This helps identify when blocking calls are made from asynchronous functions, but it is probably not going to catch everything.
os.path operations are replaced by anyio.Path operations. This is exclusive of the os.path.expandvars() method for which no pathlib analogue exists, but it is not an IO blocking operation.
subprocess operations are replaced by anyio.open_process operations, and the parsing of process output (stdout/stderr) is refactored using anyio stream wrappers.
Butler API calls are wrapped with anyio.to_thread.run_sync.

eigerx

This looks great. Really glad you were able to carry it over the finish line (and happy to see my changes in here too)! I had a few questions, + some comments on the butler functions and their usage that I'd be happy to discuss more with you. Should be good to go!

src/lsst/cmservice/common/bash.py

eigerx · 2025-01-15T18:10:06Z

src/lsst/cmservice/common/bash.py

-        with open(stamp_url, "w", encoding="utf-8") as fstamp:
-            fields = dict(status=StatusEnum.reviewable.name)
-            yaml.dump(fields, fstamp)
+        yaml_output = yaml.dump(dict(status=StatusEnum.reviewable.name))


Did we determine that synchronous yaml.dump is okay? Assuming so.

Unlike the similarly named json.dump, yaml.dump is not a file operation, so it's more alike to json.dumps where the object is serialized to a string; it is a memory operation.

It is the next line that writes the yaml-formatted string to a file.

That said, the yaml.dump function does support file operations when an open file is given as an argument, so you can use it both ways.

eigerx · 2025-01-15T18:13:24Z

src/lsst/cmservice/common/bash.py


+async def submit_file_to_run_in_bash(script_url: str | Path, log_url: str | pathlib.Path) -> None:


I approve of the more useful title.

eigerx · 2025-01-15T18:17:46Z

src/lsst/cmservice/common/bash.py

        Prefix to script_url used when rolling back
-        processing
+        processing. Will default to CWD (".").


[What does CWD stand for? If totally standard, no need to change the comment- but Google is unhelpful]

Current Working Directory.

eigerx · 2025-01-15T18:55:45Z

src/lsst/cmservice/common/butler.py

@@ -25,20 +29,27 @@ def remove_run_collections(
        Allow for missing butler
    """
    try:
-        butler = Butler.from_config(butler_repo, collections=[collection_name], without_datastore=True)
+        butler_f = partial(


Today I learned about the functools partial - I will be using this in my own code!

It's one of those deep cut standard library functions that you don't know you need until you learn about it.

eigerx · 2025-01-15T19:08:06Z

src/lsst/cmservice/common/butler.py

    except MissingCollectionError:
        pass
    except Exception as msg:
        raise errors.CMButlerCallError(msg) from msg


-def remove_non_run_collections(
+# FIXME how is this different to `remove_run_collections`?


I'm going to suggest a different function name to address this comment. This deals with butler remove-collections vs. butler remove-runs(a mostly understood distinction by users of the Butler that our code should also reflect).

This entry in the Middleware FAQ details best practices here (and I notice that it does not use remove-runs as we do. We might want to look into following the directions in this how-to instead/why we use the less safe remove-runs method). Here is the documentation on removeRuns and here is the documentation on removeCollection. Hope this helps!

I will create a story to follow up on this Butler stuff, or add it to a related story.

eigerx · 2025-01-15T19:11:13Z

src/lsst/cmservice/common/butler.py

    except Exception as msg:
        raise errors.CMButlerCallError(msg) from msg


-def remove_collection_from_chain(  # pylint: disable=unused-argument
+async def remove_collection_from_chain(


Not the scope of this PR, but see above comment and Middleware FAQ for how we could do this in future.

eigerx · 2025-01-15T19:11:34Z

src/lsst/cmservice/common/butler.py

@@ -97,7 +114,7 @@ def remove_collection_from_chain(  # pylint: disable=unused-argument
    raise NotImplementedError


-def remove_datasets_from_collections(  # pylint: disable=unused-argument
+async def remove_datasets_from_collections(


Same comment as previous.

eigerx · 2025-01-15T22:27:30Z

src/lsst/cmservice/db/script_template.py

-        with open(full_file_path, encoding="utf-8") as fin:
-            data = yaml.safe_load(fin)
-
+        full_file_path = await Path(os.path.expandvars(file_path)).resolve()


Hm, does wrapping this in Path make it play nicely with await ?

Yes, this is the fancy async-aware anyio.Path, though, not the normal pathlib.Path. So any Path operations that could be blocking are now await-able.

They have a handy list of such methods on their doc site.

tcjennings force-pushed the tickets/DM-48047 branch 2 times, most recently from ab12e02 to f22e198 Compare January 13, 2025 18:55

tcjennings changed the base branch from main to tickets/DM-48099/v0.2.0 January 13, 2025 18:56

tcjennings mentioned this pull request Jan 13, 2025

DM-48099 : Release v0.2.0 #155

Closed

tcjennings force-pushed the tickets/DM-48047 branch 3 times, most recently from 126ec66 to b701c85 Compare January 13, 2025 22:03

tcjennings force-pushed the tickets/DM-48099/v0.2.0 branch from cc3d442 to 016065a Compare January 14, 2025 14:46

tcjennings changed the base branch from tickets/DM-48099/v0.2.0 to main January 14, 2025 14:49

tcjennings force-pushed the tickets/DM-48047 branch 4 times, most recently from 10cf9c5 to 656d95e Compare January 14, 2025 15:48

tcjennings marked this pull request as ready for review January 14, 2025 16:15

tcjennings force-pushed the tickets/DM-48047 branch 3 times, most recently from 64543d7 to 429494f Compare January 15, 2025 14:38

eigerx and others added 6 commits January 15, 2025 15:19

Add asynchronous rules to ruff linter configuration

523b15e

Addressed initial ruff async errors, added helpful error messages

7aea3ff

More asyncification in src/lsst/cmservice/common/bash.py

2f3667c

feat(async): Use anyio Path operations for async File ops

47e3fd6

feat(async): Async htcondor/slurm subprocess operations

c164fb2

feat(async): Use async with Butler operations

788ab1e

tcjennings force-pushed the tickets/DM-48047 branch from 429494f to 788ab1e Compare January 15, 2025 21:21

eigerx approved these changes Jan 15, 2025

View reviewed changes

chore: Address PR feedback

f2961de

tcjennings merged commit 8fa552e into main Jan 15, 2025
8 checks passed

tcjennings deleted the tickets/DM-48047 branch January 15, 2025 23:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-48047 : Refactor sync calls in async functions #157

DM-48047 : Refactor sync calls in async functions #157

tcjennings commented Jan 13, 2025 •

edited

Loading

eigerx left a comment

eigerx Jan 15, 2025

tcjennings Jan 15, 2025

eigerx Jan 15, 2025

eigerx Jan 15, 2025

tcjennings Jan 15, 2025

eigerx Jan 15, 2025

tcjennings Jan 15, 2025

eigerx Jan 15, 2025

tcjennings Jan 15, 2025

eigerx Jan 15, 2025

eigerx Jan 15, 2025

eigerx Jan 15, 2025

tcjennings Jan 15, 2025


		async def submit_file_to_run_in_bash(script_url: str \| Path, log_url: str \| pathlib.Path) -> None:

DM-48047 : Refactor sync calls in async functions #157

DM-48047 : Refactor sync calls in async functions #157

Conversation

tcjennings commented Jan 13, 2025 • edited Loading

eigerx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tcjennings commented Jan 13, 2025 •

edited

Loading