-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
artifacts: Review naming restrictions #9821
Comments
Couldn't find specific reasons for |
That limitation was chosen just for simplicity: better to have it limited and lift it later than to allow everything and decide that we need to remove some characters. So technically I think we can allow extra characters that don't interfere with gto special signs (@ and # and !) and with git tags limitations. The only other reason I can think of that worth considering is readability. |
@aguschin Are there any restrictions within gto now (besides git tag limitations), or is it only enforced at the dvc level? |
@dberenbaum it's restricted on the GTO side AFAIK. |
We can change things on GTO side as well IMO if there's no deep reason not to (cc @aguschin) |
~Having an issue with tests, but opening this for discussion in the meantime.~ The PR adds support for uppercase letters and `_`. I decided against support for `.` because there are some restrictions around how it can be used in [git refs](https://git-scm.com/docs/git-check-ref-format). Related: iterative/dvc#9821 TODO: - [x] Tests - [ ] DVC PR - [ ] DVCLive PR to improve [warning](https://github.com/iterative/dvclive/blob/95000f45638d5126b7d76833050f549167a7449d/src/dvclive/live.py#L462-L466) - [ ] Docs PR
Waiting on #9770 so we can use gto as a dep here and not duplicate the regex and other logic in dvc |
Next steps (Updated after further discussions in following comments):
ContextDiscussions related to this issue are spread out in multiple GH issues and Slack threads. I’ve tried to summarize them below. Do the next steps make sense?
|
There is a PR for this here
The above PR doesn't handle this. If we want it, we should probably add it in the same release so we don't have to make breaking changes.
No, I think we agreed here that it won't be needed in that case. |
I think if we agree that we escape symbols, I would not do any other transformations. For upper case symbols -if GH accepts is - let's push it. WDYT? |
I see that GH accepts it (see
I'm not sure why it's happening, but feels like this is too fragile to depend on. |
It's because you are on a platform with case-insensitive filesystems (windows or macos). Git ref names are technically case sensitive, and this would work as expected on linux (where the filesystems are case sensitive). But on windows/mac, this may or may not fail, depending on the state of your git repo. (It fails if git tries to create the files This is only an issue when creating a tag that may overlap with an existing one though. There's not really any reason to prevent users from using capitalized letters in the tag name. |
My concern would be what happens when users do create an overlapping tag since gto will see them as separate artifacts but may fail to create the tag, right? It is pretty unlikely, so maybe just warning about this or catching the exception is enough. |
Let's say I have a tag like |
Yes, this is correct (note that "may fail" is significant since it depends on whether or not the existing tag ref is packed)
No, git is case sensitive. Git can tell the difference between a tag named The issue is specifically with what happens on windows/mac. In the event that you have an unpacked/loose ref file named |
Because of this weirdness, simplified best-practice for handling git refs in a cross-platform project is to just always use lowercase for branch/tag names. However, since GTO tags can include actual directory paths, if you wanted to enforce a lowercase-only rule for GTO tags, you would have to extend the rule to require that users only store artifacts in directories with lowercase paths (unless the user only cares about windows/mac). Given a GTO tag like
Also keep in mind that this gets even worse taking into consideration that directory paths can contain non-ascii characters (i.e. latin characters with accents, Cyrillic, CJK, etc) , and how git stores those characters is also dependent on platform + system encoding settings. i.e. given something like |
Considering the above comments, would this be better?
|
I think that makes sense @tapadipti. Can we try to fail early and disallow having two artifacts in the same dvc.yaml with names that are only different in casing? Edit: also, in case it's unclear, this seems like an unlikely scenario, so I don't think it should block us moving forward with allowing uppercase. |
Discussed this with @dberenbaum a short while ago.
Allowing uppercase is already done. |
I've updated the next steps in this comment. ptal. |
Let's close this one and track there. |
Can we do this at the same time we remove all naming restrictions? Otherwise, I think it could introduce new errors and be a breaking change when we implement escape characters. Also, we only need to escape characters disallowed in Git refs, but it's possible it will be simpler to take some existing escaping library and escape all characters that it handles even if many of them would be valid Git refs. |
The comment with tasks is in this issue itself. So I'm re-opening the issue.
👍
Agreed. I'll search for what library to use. If you have any suggestions, pls share. |
@shcheklein mentioned URL escaping, and we could use urllib.parse.quote for that. |
@shcheklein You mentioned that GTO "created and pushed the tag before actually running a check on the tag name". I'm looking into the code and I can see that the UPDATE: Discussed this with @shcheklein. I tried the |
Related to iterative/dvc#9821 --------- Co-authored-by: Dave Berenbaum <dave.berenbaum@gmail.com>
Hello! GTO got upgraded, DVC is using the latest GTO version, and Studio BE depends on the latest DVC version. https://github.com/iterative/studio/issues/7903 incorporates the new validation rules when adding models on the Studio FE. FE changes will be released today. |
@tapadipti Do you plan to start on the "Later" items in #9821 (comment)? Is there anything else needed before that? |
I started working on one of the items in GTO (here), and will gradually pick the others. But probably not the DVC/Studio tasks. |
Earlier, the model name was not being validated for the deprecate model action. It was getting validated for all other actions including the unassign and deregister actions (which are variations of the deprecate action). Resolves iterative/dvc#9821 (comment)
Is this still p1? @tapadipti is it still in your plans? |
Lowering the priority of the remaining items |
The current name restriction seems too limiting (and the reasons for the current restrictions are not clear cc @aguschin ) :
dvc/dvc/repo/artifacts.py
Line 20 in c71b07b
It has come up in different places:
https://iterativeai.slack.com/archives/CB41NAL8H/p1691411376466919
https://github.com/iterative/studio/issues/6897#issuecomment-1670034865
I think we should first clarify the reasons for the restrictions and then decide on allowing additional cases
Edit: go to #9821 (comment)
The text was updated successfully, but these errors were encountered: