Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added core versioning #103

Merged
merged 4 commits into from
Aug 14, 2023
Merged

Added core versioning #103

merged 4 commits into from
Aug 14, 2023

Conversation

dogversioning
Copy link
Contributor

This PR makes the following change:

  • Added new core__meta_version table with basic version number (i.e. major only)
  • Upload will now look for an exported {study}__meta_version table
  • If found, will add version to aggregator post. If not, will assume no version (i.e. version 0)

This is one half of the implementation for versioning - the other half is at this aggregator PR.

Checklist

  • Consider if documentation (like in docs/) needs to be updated
  • Consider if tests should be added
  • Run pylint if you're making changes beyond adding studies
  • Update template repo if there are changes to study configuration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@comorbidity Bringing this pattern to your attention - other studies can copy this almost verbatim.

@@ -36,4 +37,5 @@ export_list = [
"core__count_condition_month",
"core__count_medicationrequest_month",
"core__meta_date",
"core__meta_version",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a dumb question - but these meta tables are for the whole study, right? Why not just have one meta table with columns like date and version? I could imagine either approach making your life easier, so I assume that sort of reason is why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my thought was this: these are both quasi-optional, and may be present or absent, and this way there isn't a table schema dependency issue if we ever have to touch one or the other individually?

prefetch_res = requests.post(
args["url"],
json={
"study": study,
"data_package": subscription,
"data_package": data_package,
"version": f"{version:03d}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the :03d formatting? That feels like it's for some implicit reason about what the aggregator would want to see? If so, I'd like to see either a comment about it or just send the aggregator an int and it can format it like it wants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my thought was this:

  • I'd like to have sensible alphasorting version numbers, i.e. 001-009, then 010, when viewing data in buckets/transaction metadata
  • 3 decimals is a paranoid number about how many changes we should see.
  • If you are only looking at the library codebase, it might be more explicit as to what is happening

But - I can see an argument for doing this formatting on the receive side instead of the end side. I could go either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not done with the other PR, but as far as I can tell, it doesn't care about the format of the string at all - it just places the version on disk (though I suppose it could care about the format in so far as any slashes in it would mess it up real good).

Since the aggregator doesn't care, I think what you've done here makes sense. I just bumped on it because it was an unexpected place for the concern of "the admin of the aggregator would like visually pleasing sorted versions" to appear. (that's a totally valid concern, I just didn't expect it on the library side).

I do this kind of formatting in the ETL too - when writing out ndjson formats, I use 3 decimals. I get the desire.

Your code is fine as-is. You could send an int, which solves some problems (formatting can more properly live on the aggregator side and it doesn't have to check the contents of that string), but the library loses the ability to do minor versions or some other goofy thing. Trade-offs. I might land on the side of "send an int and have aggregator own the how-does-it-look-on-disk logic", but I'm not mad about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came around to bare int - this was more of a 'i did this side first' artifact.

@@ -70,5 +77,13 @@ def upload_files(args: dict):
sys.exit("user/id not provided, please pass --user and --id")
with get_progress_bar() as progress:
file_upload_progress = progress.add_task("Uploading", total=num_uploads)
meta_version = next(
filter(lambda x: "__meta_version" in str(x), file_paths), None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I will re-register discomfort with substring matching like this. This picks up a table like study__meta_versioned_bills (though I don't know why study authors are trying to be so mean).

Maybe x.endswith("__meta_version")? Or x == f"{study}__meta_version"

2nd nit: this next/filter/None approach feels odd to me. I don't hate it. I might have instead done this, but I can't argue it's necessarily cleaner. Just less iteration magic:

meta_versions = [path for path in file_paths if path.endswith()]  # or list(filter()) if that looks nicer to ya
if meta_versions:
  meta_version = meta_versions[0]  # we only expect one, could even verify that here
  ...
else:
  ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on 1: i'm happy to change to ends with, and that should be explicit
on 2: This basically prevents the validation/array extraction requirement, which I kind of like for brevity? but i can be talked out of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for posterity - taking suggestion 1, after some slack sidebar modified 2 to be try/except based for 'what :really: is the most "pythonic" way to do this?'

cumulus_library/upload.py Outdated Show resolved Hide resolved
@dogversioning dogversioning merged commit ddbf983 into main Aug 14, 2023
2 checks passed
@dogversioning dogversioning deleted the mg/versioning branch August 14, 2023 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants