-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add schema hinting to new sheets functionality (C4-1088) #280
Merged
Merged
Changes from all commits
Commits
Show all changes
68 commits
Select commit
Hold shift + click to select a range
4c84f0b
First cut at tools for parsing workbooks.
netsettler 7b73a67
Refactor to separate some functionality into a separate sevice class.
netsettler 3d4573f
Add a csv file for testing.
netsettler f4e5cfa
Add some negative testing.
netsettler e9d2465
Update lock file.
netsettler 6e9060f
Document new sheets_utils module.
netsettler df12c91
Issue a beta for this functionality.
netsettler 6a39c8a
Fix documentation for sheet_utils.
netsettler eedb5c6
Add some declarations. Small refactors to improve modularity.
netsettler a6b68fe
Rearrange some methods for presentational reasons.
netsettler 3ff63a9
First cut at useful functionality.
netsettler 39bd2e0
Some name changes to make things more abstract. workbook becomes read…
netsettler 77b72f6
Rename sheetname to tabname throughout, to be more clear that this is…
netsettler ba8c55c
Add some doc strings. Rename load_table_set to just load. Arrange for…
netsettler 50488cb
Add load_items function. Fix some test names. Update changelog.
netsettler 807e525
Experimental bug fix from Will to hopefully make get_schema_names work.
netsettler 2a8e81a
update changelog
netsettler 718054a
Update dcicutils/sheet_utils.py
netsettler 682c95a
Merge branch 'master' into kmp_sheet_utils
netsettler 582f002
Merge branch 'kmp_sheet_utils' into kmp_sheet_utils_refactor_for_csv
netsettler 56d1459
Add some comments in response to Doug's code review.
netsettler 2facf9e
Support TSV files.
netsettler bcc4e63
Add changelog info about tsv files.
netsettler 9de282e
Add a missing data file.
netsettler 8d6495f
First stable cut at schema hinting. Doesn't find schemas automaticall…
netsettler 3a103ee
Merge branch 'master' into kmp_sheet_utils
netsettler 56f702a
Mark chardet as an acceptable license for use.
netsettler 08d428e
Merge branch 'kmp_sheet_utils' into kmp_sheet_utils_refactor_for_csv
netsettler 42ad579
Merge branch 'kmp_sheet_utils_refactor_for_csv' into kmp_sheet_utils_…
netsettler 60ada3f
Backport some small fixes and cosmetics from the schemas branch.
netsettler 690a833
Cosmetic fix.
netsettler 946b998
Add some missing newlines in data files.
netsettler 36e7de0
Support for coping with .tsv files where trailing whitespace is 'help…
netsettler a51fb27
PEP8
netsettler 6f097a6
Document our choice of why is_uuid is defined here as it is.
netsettler 477c7a2
PEP8
netsettler 09b4c43
Fix error handling to be clearer.
netsettler f3bd815
Fix CHANGELOG to reflect recent renamings.
netsettler 7627f6f
Fix a type hint and some PEP8.
netsettler 98cd37c
Implement a cut at escaping for tsv files.
netsettler 3852e56
Add a test case for all of the pieces of parsing and schema hinting p…
netsettler 660df9c
Small cosmetic changes and some additional support for upcoming work.
netsettler 34d528b
Fix a unit test to conform to new google account name.
netsettler 1c34ad0
Fix typo in comment (dcicutils/misc_utils.py)
netsettler 41fad79
Add some doc strings and comments.
netsettler 6e8ce2c
Rename tabname to tab_name throughout the sheet_utils interfaces.
netsettler 04eb58c
Add support for reading inserts dirs, .json, .jsonl (two formats), an…
netsettler ce9f9bc
Bump beta version.
netsettler 0ea5b62
Add yaml formats.
netsettler bcc1128
Add class AbstractItemManager. Rename InsertsItemManager to InsertsDi…
netsettler 7de093a
Rename ._parser() to ._parse_json_data(). Factor type checks out of .…
netsettler b01e34b
Rename _parse_json_data, _load_json_data, and _check_json_data, respe…
netsettler 0ae48ee
WIP. Testing good.
netsettler 1e2c5a9
WIP. Tests passing.
netsettler b8a4c39
Rearrange the way escaping= works so both csv an tsv files can using …
netsettler a2fe079
Separate registration of regular table set managers from registration…
netsettler 91ddce0
Stub in checking of required headers.
netsettler 142a20b
Bump beta version.
netsettler e09af07
PEP8
netsettler 70762c6
Merge branch 'kmp_schemas_from_vapp' into kmp_sheet_utils_with_vapp
netsettler 7d2ecaa
Fix a bug in newly proposed ff_utils.get_schemas with vapp.
netsettler 5e46273
Extend VirtualApp to amke it easier to test by adding an AbstractVirt…
netsettler 53de60a
Implement portal_vapp= in sheet_utils.
netsettler 630720f
Simplifications per Will's code review.
netsettler 5a07b69
Merge utils 7.10.0 from master.
netsettler 295adfe
Merge pull request #282 from 4dn-dcic/kmp_sheet_utils_with_vapp
netsettler 486adce
Merge branch 'master' into kmp_sheet_utils_schema_hinting
netsettler 54c51aa
Add support for zipped files.
netsettler File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm just misreading, but somehow this comment confused me - I thought you were saying want more strict checking (i.e. only confirm as uuid if it has the dashes) but you want less strict (dashes optional). And interesting, I did not realize curly braces are sometimes OK around UUIDs; weird that allowed to not match though, but that looks like what we actually accept in portal URLs anyways.
But I think this is not quite right; as it is, for example, it says that this is a valid UUID:
08AF90EB-C847-43A7-8B3E-2E64EBAC4683-xyzzy
I think you want the regex to begin/end with ^$, i.e.:
r'^(?i)[{]?(?:[0-9a-f]{4}-?){8}[}]?$'
Oh, interesting, I see from tests, and from trying with portal URLs that some trailing stuff is tolerated - so maybe this was intentional - hmm - (I'd like to track down our code that actually parses this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used already in schema validation just as it is. I'm going to leave this precisely how it is because it is what portals are defined to do and the whole point was to borrow the definition they were using for other uses and not to have multiple definitions. if someone reports a bug in what the portals do and we decide to fix it, my code will tolerate the fix since it's operating on the same data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had already updated the comment when I saw this. I agree it looked confused. Thanks for confirming that. :)