Skip to content

Commit

Permalink
Possible expand_shards documentation typo
Browse files Browse the repository at this point in the history
expand_shards should take in the same input as data.root in the documentation right?
  • Loading branch information
Oufattole authored Sep 14, 2024
1 parent 4c066d7 commit 03b962a
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ aces-cli cohort_name="foo" cohort_dir="bar/" data.standard=meds data.path="baz.p
A MEDS dataset can have multiple shards, each stored as a `.parquet` file containing subsets of the full dataset. We can make use of Hydra's launchers and multi-run (`-m`) capabilities to start an extraction job for each shard (`data=sharded`), either in series or in parallel (e.g., using `joblib`, or `submitit` for Slurm). To load data with multiple shards, a data root needs to be provided, along with an expression containing a comma-delimited list of files for each shard. We provide a function `expand_shards` to do this, which accepts a sequence representing `<shards_location>/<number_of_shards>`. It also accepts a file directory, where all `.parquet` files in its directory and subdirectories will be included.

```bash
aces-cli cohort_name="foo" cohort_dir="bar/" data.standard=meds data=sharded data.root="baz/" "data.shard=$(expand_shards qux/#)" -m
aces-cli cohort_name="foo" cohort_dir="bar/" data.standard=meds data=sharded data.root="baz/" "data.shard=$(expand_shards baz/)" -m
```

### ESGPT
Expand Down

0 comments on commit 03b962a

Please sign in to comment.