-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dvclive: update how to track results #4674
Changes from 6 commits
b61eca3
7a24fb3
1589d59
5b048b8
5d329ea
e431055
977c490
fc31ee9
effbfb8
f57b700
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -73,43 +73,54 @@ model.pt.dvc | |
|
||
## Track the results | ||
|
||
DVCLive expects each run to be tracked by Git, so it will save each run to the | ||
same path and overwrite the results each time. Include | ||
### Git integration | ||
|
||
Unlike other experiment trackers, DVCLive relies on Git to track the [directory] | ||
it generates, so it will save each run to the same path and overwrite the | ||
results each time. DVCLive uses Git to manage results, code changes, and data | ||
changes ([with DVC](#track-large-artifacts-with-dvc)). Include | ||
[`save_dvc_exp=True`](/doc/dvclive/live#parameters) to auto-track as a <abbr>DVC | ||
experiment</abbr>. DVC experiments are Git commits that DVC can find but that | ||
don't clutter your Git history or create extra branches. | ||
experiment</abbr> so you don't need to worry about manually making Git commits | ||
or branches for each experiment. You can recover them using `dvc exp` commands | ||
or using Git. | ||
|
||
### Track large artifacts with DVC | ||
|
||
Models and data are often large and aren't easily tracked in Git. | ||
`Live.log_artifact("model.pt", type="model")` will | ||
[cache](/doc/start/data-management/data-versioning) the `model.pt` file with DVC | ||
and make Git ignore it. It will generate a `model.pt.dvc` metadata file, which | ||
can be tracked in Git and becomes part of the experiment. With this metadata | ||
file, you can [retrieve](/doc/start/data-management/data-versioning#retrieving) | ||
the versioned artifact from the Git commit. | ||
|
||
If `Live` was initialized with `dvcyaml=True` (which is the default) and you | ||
include values for any of the optional metadata arguments, this will add an | ||
[artifact](/doc/user-guide/project-structure/dvcyaml-files#artifacts) to the | ||
corresponding `dvc.yaml`. Passing `type="model"` will mark it as a `model` for | ||
DVC and will also show it in | ||
[Studio Model Registry](/doc/studio/user-guide/model-registry/what-is-a-model-registry). | ||
`Live.log_artifact("model.pt")` will [cache] the `model.pt` file with DVC and | ||
make Git ignore it. It will generate a `model.pt.dvc` metadata file, which can | ||
be tracked in Git and becomes part of the experiment. With this metadata file, | ||
you can [retrieve](/doc/start/data-management/data-versioning#retrieving) the | ||
versioned artifact from the Git commit. You can also use | ||
`Live.log_artifact("model.pt", type="model")` to add it to the [Studio Model | ||
Registry]. | ||
|
||
Using `Live.log_image()` to log multiple images may also grow too large to track | ||
with Git, in which case you can use | ||
[`Live(cache_images=True)`](/doc/dvclive/live#parameters) to cache them. | ||
|
||
### Run with DVC | ||
### Customize with DVC | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that probably also a bit too much? even if we keep it - should it be part of the Run with DVC? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moved to part of Run with DVC and consolidated slightly. |
||
|
||
DVCLive by default [generates] its own `dvc.yaml` file to configure the | ||
experiment results, but you can create your own `dvc.yaml` file to customize | ||
your project. For example, to define a [pipeline](#run-with-dvc) or | ||
[customize plots](/doc/user-guide/experiment-management/visualizing-plots#defining-plots). | ||
Do not reuse the DVCLive `dvc.yaml` file since it gets overwritten during each | ||
experiment run. Instead, write customizations to a new `dvc.yaml` file at the | ||
base of your repository or elsewhere outside the DVCLive directory. | ||
|
||
## Run with DVC | ||
|
||
Experimenting in Python interactively (like in notebooks) is great for | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are there any other benefits? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are more benefits listed later in the paragraph. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yep, that's fine - it's just a bit abstract to me (as an end user). I mean the "more structured way to run There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed the examples here from |
||
exploration, but eventually you may need a more structured way to run | ||
reproducible experiments (for example, running a parallelized hyperparameter | ||
search). By configuring DVC <abbr>pipelines</abbr>, you can | ||
search). By configuring DVC [pipelines], you can | ||
[run experiments](/doc/user-guide/experiment-management/running-experiments) | ||
with `dvc exp run`. | ||
with `dvc exp run`. This will track the inputs and outputs of your code, and | ||
also enable features like queuing, parameter tuning, and grid searches. | ||
|
||
You can configure a pipeline stage in `dvc.yaml` like: | ||
You can configure a pipeline stage in your own `dvc.yaml` file at the base of | ||
the repository (see [Customize with DVC](#customize-with-dvc)): | ||
|
||
```yaml | ||
stages: | ||
|
@@ -121,21 +132,42 @@ stages: | |
- model.pt | ||
``` | ||
|
||
Add this pipeline stage into `dvc.yaml`, modifying it to fit your project. Then, | ||
run it with `dvc exp run`. This will track the inputs and outputs of your code, | ||
and also enable features like queuing, parameter tuning, and grid searches. | ||
|
||
<admon type="warn"> | ||
<admon type="tip"> | ||
|
||
Add to a `dvc.yaml` file at the base of your repository. Do not use | ||
`dvclive/dvc.yaml` since DVCLive will overwrite it during each run. | ||
You may have previously tracked [outputs] with `Live.log_artifact()` that | ||
generated a `.dvc` file like `model.pt.dvc`. DVC will not allow you to also add | ||
`model.pt` as a pipeline [output][outputs] since it is already tracked by | ||
`model.pt.dvc`. You must `dvc remove model.pt.dvc` before you can add it to the | ||
pipeline. You can optionally drop `Live.log_artifact()` from your code. | ||
|
||
</admon> | ||
|
||
<admon type="tip"> | ||
Optionally add any subpaths of the DVCLive [directory] to the [outputs]. DVC | ||
will [cache] them by default, and you can use those paths as [dependencies] | ||
downstream in your pipeline. For example, to cache all DVCLive plots: | ||
|
||
```diff | ||
stages: | ||
dvclive: | ||
cmd: <python my_code_file.py my_args> | ||
deps: | ||
- <my_code_file.py> | ||
outs: | ||
- model.pt | ||
+ - dvclive/plots | ||
``` | ||
|
||
If you already have a `.dvc` file like `model.pt.dvc`, DVC will not allow you to | ||
also track `model.pt` in `dvc.yaml`. You must `dvc remove model.pt.dvc` before | ||
you can add it to `dvc.yaml`. | ||
<admon type="warn"> | ||
|
||
Do not add the entire DVCLive [directory] since DVC does not expect the DVCLive | ||
`dvc.yaml` file to be inside the [outputs]. | ||
|
||
</admon> | ||
|
||
[directory]: /doc/dvclive/how-it-works#directory-structure | ||
[studio model registry]: /doc/studio/user-guide/model-registry | ||
[cache]: /doc/start/data-management/data-versioning | ||
[outputs]: /doc/user-guide/pipelines/defining-pipelines#outputs | ||
[dependencies]: /doc/user-guide/pipelines/defining-pipelines#simple-dependencies | ||
[pipelines]: /doc/start/experiments/experiment-pipelines | ||
[generates]: /doc/dvclive/live/make_dvcyaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2cs: I think track results can start with a bit basic stuff and something that I think more people can relate to / understands faster.
1.that we can track them in VS Code and Studio
2.may be ways to compare experiments, or just experiments, or tracking experiments - that where we can go into Git concept to a certain degree and large files, etc (even though I still think we need
The biggest issues with explanation is that people don't expect it / can't most likely even understand why we put it here until they hit some issues.
May be another idea - "DVCLive vs other trackers: important workflow details".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed from "Track the results" to "Git and DVC integration" and introduced it by explaining that this differentiates it from other experiment trackers.