Skip to content

Commit

Permalink
guide: document matrix in dvc.yaml
Browse files Browse the repository at this point in the history
  • Loading branch information
skshetry committed Aug 14, 2023
1 parent edcd9d1 commit 304fa4b
Show file tree
Hide file tree
Showing 4 changed files with 70 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .dvc/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
/config.local
/cache
Empty file added .dvc/config
Empty file.
3 changes: 3 additions & 0 deletions .dvcignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Add patterns of files dvc should ignore, which could improve
# the performance. Learn more at
# https://dvc.org/doc/user-guide/dvcignore
65 changes: 65 additions & 0 deletions content/docs/user-guide/project-structure/dvcyaml-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -745,6 +745,71 @@ Both individual foreach stages (`train@1`) and groups of foreach stages

</admon>

## `matrix` stages

`matrix` allows you do to define multiple stages based on combinations of
variables. A `matrix` element accepts one or more variables, each having
possible list of values to iterate with. For example:

```yaml
stages:
train:
matrix:
model: [cnn, xgb]
feature: [feature1, feature2, feature3]
cmd: ./train.py --feature ${item.feature} ${item.model}
outs:
- ${item.model}.pkl
```

The values of variables become available in `item` dictionary, and you can
reference each variable in your stage definition. In above example, you can
access `item.model` and `item.feature`.

On `dvc repro`, dvc will expand the definition to multiple stages for each
possible combination of the variables. In above example, dvc will create six
stages, one for each combination of `model` and`feature`. The name of the stages
will be generated by appending values of the variables to the stage name after a
`@` as with foreach. For example, dvc will create following stages:

- _train@cnn-feature1_
- _train@cnn-feature2_
- _train@cnn-feature3_
- _train@xgb-feature1_
- _train@xgb-feature2_
- _train@xgb-feature3_

Both individual matrix stages (eg: `train@cnn-feature1`) and group of matrix
stages (`train`) may be used in commands that accept stage targets.

The values in variables can be simple values such as string, integer, etc and
composite values such as list, dictionary, etc. For example:

```yaml
matrix:
labels:
- [label1, label2, label3]
- [labelX, labelY, labelZ]
config:
- n_estimators: 150
max_depth: 20
- n_estimators: 120
max_depth: 30
```

When using a list or a dictionary, dvc will generate the name of stages based on
variable name and id, which is based on index of the value. In above example,
generated stages may look like `train@labels0-config0`.

Note that templating is supported inside `matrix` and in definition, so you can
reference things defined in `vars` and from your imported file as usual.

```yaml
matrix:
labels: ${labels}
config: ${config}
```

## dvc.lock file

To record the state of your pipeline(s) and help track its <abbr>outputs</abbr>,
Expand Down

0 comments on commit 304fa4b

Please sign in to comment.