-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dvclive: update how to track results #4674
Conversation
content/docs/dvclive/how-it-works.md
Outdated
|
||
<admon type="tip"> | ||
|
||
`save_dvc_exp=True` is ignored when [running with DVC](#run-with-dvc) since |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it save_dvc_exp=False
that is ignored? or just save_dvc_exp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The increased complexity worries me, although I don't have that many ideas on how to fight that. I would prefer to keep DVCLive docs about the happy path.
I see some potential changes like:
-
making
save_dvc_exp=True
by default in DVCLive so we could drop all the paragraphs about it. -
Dropping
Track large artifacts with DVC
from here. We could say something like "uselog_artifact
to track with DVC" and redirect to a DVC page about data management. -
Dropping
Run with DVC
. We could say "If you have or want to use a DVC pipeline go here" and link to a DVC page about pipelines. -
Dropping
Customize with DVC
. It feels like it should be part ofRunning with DVC
.
Same. And I also don't know a good solution for this yet. It feels we need to brainstorm the next iteration. What else we can do to make it simler. |
Not pretending to know the right balance of simplicity vs complexity which we are always struggling to get right, but my sense from recent feedback is that we have enough simple happy-path examples, and people struggle to understand how things work beyond that. This page to me is the equivalent of the dvclive user guide, where I would expect an in-depth explanation of how things work. How does it hurt the happy path?
We can do this next release, but I think we should still mention here how it works or there's no way for people to understand what it does or the dangers of setting it to false.
This already links to those pages, but I think it's helpful to discuss how it specifically applies to the dvclive scenario.
What about customizing plots? It doesn't feel to me like it belongs in |
Discussed a couple concerns with @daavoo:
Let me know if I missed anything. I'll think on these and try to do another draft. |
I took another pass at this and here's what I have:
I'm also open to moving all the info into @shcheklein @daavoo PTAL when you have a chance 🙏 |
Also note that this would help with iterative/dvclive#631. We could catch cases where users call |
Seeing how much space we spend warning about not writing to |
@shcheklein @daavoo Any thoughts here? Do you feel it's better to close it? |
content/docs/dvclive/how-it-works.md
Outdated
same path and overwrite the results each time. Include | ||
### Git integration | ||
|
||
Unlike other experiment trackers, DVCLive relies on Git to track the [directory] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2cs: I think track results can start with a bit basic stuff and something that I think more people can relate to / understands faster.
1.that we can track them in VS Code and Studio
2.may be ways to compare experiments, or just experiments, or tracking experiments - that where we can go into Git concept to a certain degree and large files, etc (even though I still think we need
The biggest issues with explanation is that people don't expect it / can't most likely even understand why we put it here until they hit some issues.
May be another idea - "DVCLive vs other trackers: important workflow details".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed from "Track the results" to "Git and DVC integration" and introduced it by explaining that this differentiates it from other experiment trackers.
content/docs/dvclive/how-it-works.md
Outdated
|
||
Using `Live.log_image()` to log multiple images may also grow too large to track | ||
with Git, in which case you can use | ||
[`Live(cache_images=True)`](/doc/dvclive/live#parameters) to cache them. | ||
|
||
### Run with DVC | ||
### Customize with DVC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that probably also a bit too much? even if we keep it - should it be part of the Run with DVC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to part of Run with DVC and consolidated slightly.
experiment run. Instead, write customizations to a new `dvc.yaml` file at the | ||
base of your repository or elsewhere outside the DVCLive directory. | ||
|
||
## Run with DVC | ||
|
||
Experimenting in Python interactively (like in notebooks) is great for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any other benefits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are more benefits listed later in the paragraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, that's fine - it's just a bit abstract to me (as an end user). I mean the "more structured way to run
reproducible experiments" part and parallelized hyperparameter search
jumps right into the advanced case. Again, I'm paying a lot of attention to this here since I expect the readers of this won't be DVC, and even not necessarily advanced Git users. There should be a story using their language / terminology as much as possible. Sorry, Dave for all this iterations. no intent to block it. I'm fine to merge it any time since it's an improvement already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the examples here from parallelized hyperparameter search
to multi-step pipeline or queueing multiple experiments
.
I think the added information is valuable, despite the concerns about formatting/location. |
@shcheklein Did one more round of iterations. Let me know if you want to take a look. |
DVCLive expects each run to be tracked by Git, so it will save each run to the | ||
same path and overwrite the results each time. Include | ||
DVCLive differs from some other experiment trackers by relying on Git and DVC | ||
for tracking instead of a central database. This provides a closer connection to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick thought: I guess it's somewhat similar to Tensorboard btw (no Git, but also not central database)
Opening this in place of #4660 based on the comment to keep everything in one page.
Closes #4644.
This separates how dvclive tracks results and works with git and dvc into its own page. Before merging, we should decide where these explanations are sufficient, and where we need to make product updates to simplify.