Added core versioning #103

dogversioning · 2023-08-08T15:00:17Z

This PR makes the following change:

Added new core__meta_version table with basic version number (i.e. major only)
Upload will now look for an exported {study}__meta_version table
If found, will add version to aggregator post. If not, will assume no version (i.e. version 0)

This is one half of the implementation for versioning - the other half is at this aggregator PR.

Checklist

Consider if documentation (like in docs/) needs to be updated
Consider if tests should be added
Run pylint if you're making changes beyond adding studies
Update template repo if there are changes to study configuration

dogversioning · 2023-08-08T15:02:14Z

cumulus_library/studies/core/version.sql

@comorbidity Bringing this pattern to your attention - other studies can copy this almost verbatim.

mikix · 2023-08-08T15:08:55Z

cumulus_library/studies/core/manifest.toml

@@ -36,4 +37,5 @@ export_list = [
    "core__count_condition_month",
    "core__count_medicationrequest_month",
    "core__meta_date",
+    "core__meta_version",


This might be a dumb question - but these meta tables are for the whole study, right? Why not just have one meta table with columns like date and version? I could imagine either approach making your life easier, so I assume that sort of reason is why.

I guess my thought was this: these are both quasi-optional, and may be present or absent, and this way there isn't a table schema dependency issue if we ever have to touch one or the other individually?

mikix · 2023-08-08T15:11:51Z

cumulus_library/upload.py

    prefetch_res = requests.post(
        args["url"],
        json={
            "study": study,
-            "data_package": subscription,
+            "data_package": data_package,
+            "version": f"{version:03d}",


Why the :03d formatting? That feels like it's for some implicit reason about what the aggregator would want to see? If so, I'd like to see either a comment about it or just send the aggregator an int and it can format it like it wants.

So my thought was this:

I'd like to have sensible alphasorting version numbers, i.e. 001-009, then 010, when viewing data in buckets/transaction metadata

3 decimals is a paranoid number about how many changes we should see.

If you are only looking at the library codebase, it might be more explicit as to what is happening

But - I can see an argument for doing this formatting on the receive side instead of the end side. I could go either way.

I'm not done with the other PR, but as far as I can tell, it doesn't care about the format of the string at all - it just places the version on disk (though I suppose it could care about the format in so far as any slashes in it would mess it up real good).

Since the aggregator doesn't care, I think what you've done here makes sense. I just bumped on it because it was an unexpected place for the concern of "the admin of the aggregator would like visually pleasing sorted versions" to appear. (that's a totally valid concern, I just didn't expect it on the library side).

I do this kind of formatting in the ETL too - when writing out ndjson formats, I use 3 decimals. I get the desire.

Your code is fine as-is. You could send an int, which solves some problems (formatting can more properly live on the aggregator side and it doesn't have to check the contents of that string), but the library loses the ability to do minor versions or some other goofy thing. Trade-offs. I might land on the side of "send an int and have aggregator own the how-does-it-look-on-disk logic", but I'm not mad about it.

I came around to bare int - this was more of a 'i did this side first' artifact.

mikix · 2023-08-08T15:14:22Z

cumulus_library/upload.py

@@ -70,5 +77,13 @@ def upload_files(args: dict):
        sys.exit("user/id not provided, please pass --user and --id")
    with get_progress_bar() as progress:
        file_upload_progress = progress.add_task("Uploading", total=num_uploads)
+        meta_version = next(
+            filter(lambda x: "__meta_version" in str(x), file_paths), None


nit: I will re-register discomfort with substring matching like this. This picks up a table like study__meta_versioned_bills (though I don't know why study authors are trying to be so mean).

Maybe x.endswith("__meta_version")? Or x == f"{study}__meta_version"

2nd nit: this next/filter/None approach feels odd to me. I don't hate it. I might have instead done this, but I can't argue it's necessarily cleaner. Just less iteration magic:

meta_versions = [path for path in file_paths if path.endswith()] # or list(filter()) if that looks nicer to ya if meta_versions: meta_version = meta_versions[0] # we only expect one, could even verify that here ... else: ...

on 1: i'm happy to change to ends with, and that should be explicit
on 2: This basically prevents the validation/array extraction requirement, which I kind of like for brevity? but i can be talked out of it.

for posterity - taking suggestion 1, after some slack sidebar modified 2 to be try/except based for 'what :really: is the most "pythonic" way to do this?'

cumulus_library/upload.py

added core versioning

617f681

dogversioning mentioned this pull request Aug 8, 2023

Data versioning support smart-on-fhir/cumulus-aggregator#99

Merged

dogversioning requested a review from mikix August 8, 2023 15:01

dogversioning commented Aug 8, 2023

View reviewed changes

test updates

e8b72c4

mikix approved these changes Aug 8, 2023

View reviewed changes

mikix reviewed Aug 8, 2023

View reviewed changes

cumulus_library/upload.py Outdated Show resolved Hide resolved

dogversioning added 2 commits August 8, 2023 12:12

PR cleanup

3573cf5

tweaks to version detection

356fafd

dogversioning merged commit ddbf983 into main Aug 14, 2023
2 checks passed

dogversioning deleted the mg/versioning branch August 14, 2023 14:38

dogversioning mentioned this pull request Aug 14, 2023

SPIKE: how do we handle versions of studies? #73

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added core versioning #103

Added core versioning #103

dogversioning commented Aug 8, 2023

dogversioning Aug 8, 2023

mikix Aug 8, 2023

dogversioning Aug 8, 2023

mikix Aug 8, 2023

dogversioning Aug 8, 2023

mikix Aug 8, 2023

dogversioning Aug 8, 2023

mikix Aug 8, 2023

dogversioning Aug 8, 2023

dogversioning Aug 8, 2023

Added core versioning #103

Added core versioning #103

Conversation

dogversioning commented Aug 8, 2023

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment