Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp show: show metrics that include separator #9819

Merged
merged 7 commits into from
Aug 24, 2023
Merged

Conversation

dberenbaum
Copy link
Collaborator

Fixes dvc exp show --sort-by when : separator is included in metric name (see https://iterativeai.slack.com/archives/C03JS2V4MQU/p1691415243711839). Not sure if we should explicitly allow this, but I don't see any harm with this fix.

@dberenbaum dberenbaum requested a review from a team August 8, 2023 14:04
@dberenbaum
Copy link
Collaborator Author

cc @d-miketa

@codecov
Copy link

codecov bot commented Aug 8, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.05% 🎉

Comparison is base (88431b1) 90.81% compared to head (b12f501) 90.86%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9819      +/-   ##
==========================================
+ Coverage   90.81%   90.86%   +0.05%     
==========================================
  Files         471      471              
  Lines       35929    35945      +16     
  Branches     5194     5196       +2     
==========================================
+ Hits        32630    32663      +33     
+ Misses       2715     2697      -18     
- Partials      584      585       +1     
Files Changed Coverage Δ
dvc/repo/experiments/show.py 88.63% <100.00%> (+1.57%) ⬆️
tests/func/experiments/test_show.py 99.68% <100.00%> (+0.01%) ⬆️

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

for path in param_names:
if sort_name in param_names[path]:
matches.add((path, sort_name, "params"))
if sort_by in param_names[path]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to support it in --set-param or are we implying that this will only be supported for exp show / metrics?

(in the current fix we are also supporting params in the exp show)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth discussing, but I figured even this limited fix is better than the current situation.

path, _, sort_name = sort_by.rpartition(":")
path, _, sort_name = sort_by.partition(":")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to address #9819 (comment), but not sure if there was a reason for using rpartition (cc @pmrowla)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have colons in filenames.

@daavoo daavoo added A: experiments Related to dvc exp ui user interface / interaction enhancement Enhances DVC labels Aug 8, 2023
Copy link
Contributor

@daavoo daavoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't mean to block this.

For the record, would be good to test against Studio

@dberenbaum
Copy link
Collaborator Author

dberenbaum commented Aug 18, 2023

@daavoo Can we add a test in Studio rather than do it manually?

@dberenbaum
Copy link
Collaborator Author

Let's get approval from @pmrowla to make sure it doesn't break some expectation in the existing code.

@daavoo
Copy link
Contributor

daavoo commented Aug 18, 2023

Let's get approval from @pmrowla to make sure it doesn't break some expectation in the existing code.

If it does, let's please add a regression test

Copy link
Contributor

@pmrowla pmrowla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really a problem with the path:metric_or_param addressing in general, it breaks as soon as you have a colon in either the path or the metric/param name.

IMO we should just explicitly disallow the use of colons in filename + metric/param names for now. If this is really something users want then we need to have actual support for escaping the colon in CLI flags, so you would have to do --sort-by=metrics\:file.json:my\:metric (this affects exp run --set-param too)

path, _, sort_name = sort_by.rpartition(":")
path, _, sort_name = sort_by.partition(":")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have colons in filenames.

@daavoo
Copy link
Contributor

daavoo commented Aug 22, 2023

This is really a problem with the path:metric_or_param addressing in general, it breaks as soon as you have a colon in either the path or the metric/param name.

IMO we should just explicitly disallow the use of colons in filename + metric/param names for now. If this is really something users want then we need to have actual support for escaping the colon in CLI flags, so you would have to do --sort-by=metrics\:file.json:my\:metric (this affects exp run --set-param too)

@pmrowla @dberenbaum can we instead check, for all the supported suffixes, if {suffix}: is in the string and use that for partition, otherwise ignore the : character ?

@pmrowla
Copy link
Contributor

pmrowla commented Aug 22, 2023

@pmrowla @dberenbaum can we instead check, for all the supported suffixes, if {suffix}: is in the string and use that for partition, otherwise ignore the : character ?

We could, but it's still naive given that DVC allows yaml metrics or params files to have any file extension (we explicitly treat .toml/.json/.py as those specific types and then load any other file as yaml)

@dberenbaum
Copy link
Collaborator Author

@pmrowla Updated the logic so it should handle : in filenames or metric names. PTAL.

Comment on lines 189 to 195
for split_num in range(1, len(parts)):
path = sep.join(parts[:split_num])
sort_name = sep.join(parts[split_num:])
if path in metric_names and sort_name in metric_names[path]:
matches.add((path, sort_name, "metrics"))
if path in param_names and sort_name in param_names[path]:
matches.add((path, sort_name, "params"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would have to go from range(0, len(parts)), it has to cover the case where there is no path, and only a metric containing a colon

IMO this is also more confusing than requiring escaping, and I think there are still edge cases where the user could have filenames and metric names that overlap and break this kind of behavior

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would have to go from range(0, len(parts)), it has to cover the case where there is no path, and only a metric containing a colon

Thanks, fixed.

IMO this is also more confusing than requiring escaping, and I think there are still edge cases where the user could have filenames and metric names that overlap and break this kind of behavior

My thoughts on this PR:

  1. It requires nothing of the user
  2. Edge cases will show the same error about being ambiguous that shows now unless I miss something
  3. Managing escape characters between how they get read in the user's shell and how we handle them seems more confusing to me (and more for the user to understand)
  4. Don't see much harm since the implementation already exists

Do you feel it's a blocker and escaping is the only way to go?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the existing behavior is already broken, we can merge this.

But my objection is more that this same problem affects every other command/flag combination that uses path:metric_or_param addressing (like exp run --set-param and probably also stage add --params). I would prefer to only have to fix this once, in a consistent and re-usable way. This PR is just patching over one symptom of a broader underlying problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in the same vein, I'm not sure that we actually document that . is an invalid character in metric or parameter names, but for all intents and purposes we don't support it because we treat . as the dictionary level separator in metric/param addressing (even though it is valid to have string keys that contain . in yaml/json). It would be easier for us to just treat : the same way and disallow it entirely in metric/param names

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid points. As mentioned in the issue description, I don't see much harm in doing this unless/until we forbid these characters or have a better solution like escaping, so I'm going to merge but open an issue to discuss whether we should allow it.

@dberenbaum
Copy link
Collaborator Author

@pmrowla Are you okay approving or removing you change request? I opened #9877 to discuss a better solution, but I don't see this as a high priority since it hasn't been a common issue.

@dberenbaum dberenbaum merged commit ef260ea into main Aug 24, 2023
19 of 20 checks passed
@dberenbaum dberenbaum deleted the metric-name-colon branch August 24, 2023 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp enhancement Enhances DVC ui user interface / interaction
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants