-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added core versioning #103
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@comorbidity Bringing this pattern to your attention - other studies can copy this almost verbatim.
@@ -36,4 +37,5 @@ export_list = [ | |||
"core__count_condition_month", | |||
"core__count_medicationrequest_month", | |||
"core__meta_date", | |||
"core__meta_version", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a dumb question - but these meta tables are for the whole study, right? Why not just have one meta table with columns like date
and version
? I could imagine either approach making your life easier, so I assume that sort of reason is why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my thought was this: these are both quasi-optional, and may be present or absent, and this way there isn't a table schema dependency issue if we ever have to touch one or the other individually?
cumulus_library/upload.py
Outdated
prefetch_res = requests.post( | ||
args["url"], | ||
json={ | ||
"study": study, | ||
"data_package": subscription, | ||
"data_package": data_package, | ||
"version": f"{version:03d}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the :03d
formatting? That feels like it's for some implicit reason about what the aggregator would want to see? If so, I'd like to see either a comment about it or just send the aggregator an int and it can format it like it wants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my thought was this:
- I'd like to have sensible alphasorting version numbers, i.e. 001-009, then 010, when viewing data in buckets/transaction metadata
- 3 decimals is a paranoid number about how many changes we should see.
- If you are only looking at the library codebase, it might be more explicit as to what is happening
But - I can see an argument for doing this formatting on the receive side instead of the end side. I could go either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not done with the other PR, but as far as I can tell, it doesn't care about the format of the string at all - it just places the version on disk (though I suppose it could care about the format in so far as any slashes in it would mess it up real good).
Since the aggregator doesn't care, I think what you've done here makes sense. I just bumped on it because it was an unexpected place for the concern of "the admin of the aggregator would like visually pleasing sorted versions" to appear. (that's a totally valid concern, I just didn't expect it on the library side).
I do this kind of formatting in the ETL too - when writing out ndjson formats, I use 3 decimals. I get the desire.
Your code is fine as-is. You could send an int, which solves some problems (formatting can more properly live on the aggregator side and it doesn't have to check the contents of that string), but the library loses the ability to do minor versions or some other goofy thing. Trade-offs. I might land on the side of "send an int and have aggregator own the how-does-it-look-on-disk logic", but I'm not mad about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came around to bare int - this was more of a 'i did this side first' artifact.
cumulus_library/upload.py
Outdated
@@ -70,5 +77,13 @@ def upload_files(args: dict): | |||
sys.exit("user/id not provided, please pass --user and --id") | |||
with get_progress_bar() as progress: | |||
file_upload_progress = progress.add_task("Uploading", total=num_uploads) | |||
meta_version = next( | |||
filter(lambda x: "__meta_version" in str(x), file_paths), None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I will re-register discomfort with substring matching like this. This picks up a table like study__meta_versioned_bills
(though I don't know why study authors are trying to be so mean).
Maybe x.endswith("__meta_version")
? Or x == f"{study}__meta_version"
2nd nit: this next/filter/None approach feels odd to me. I don't hate it. I might have instead done this, but I can't argue it's necessarily cleaner. Just less iteration magic:
meta_versions = [path for path in file_paths if path.endswith()] # or list(filter()) if that looks nicer to ya
if meta_versions:
meta_version = meta_versions[0] # we only expect one, could even verify that here
...
else:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on 1: i'm happy to change to ends with, and that should be explicit
on 2: This basically prevents the validation/array extraction requirement, which I kind of like for brevity? but i can be talked out of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for posterity - taking suggestion 1, after some slack sidebar modified 2 to be try/except based for 'what :really: is the most "pythonic" way to do this?'
This PR makes the following change:
This is one half of the implementation for versioning - the other half is at this aggregator PR.
Checklist
docs/
) needs to be updated