Possible expand_shards documentation typo

expand_shards should take in the same input as data.root in the documentation right?
justin13601 · Sep 14, 2024 · 03b962a · 03b962a
1 parent 4c066d7
commit 03b962a
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/docs/source/usage.md b/docs/source/usage.md
@@ -212,7 +212,7 @@ aces-cli cohort_name="foo" cohort_dir="bar/" data.standard=meds data.path="baz.p
 A MEDS dataset can have multiple shards, each stored as a `.parquet` file containing subsets of the full dataset. We can make use of Hydra's launchers and multi-run (`-m`) capabilities to start an extraction job for each shard (`data=sharded`), either in series or in parallel (e.g., using `joblib`, or `submitit` for Slurm). To load data with multiple shards, a data root needs to be provided, along with an expression containing a comma-delimited list of files for each shard. We provide a function `expand_shards` to do this, which accepts a sequence representing `<shards_location>/<number_of_shards>`. It also accepts a file directory, where all `.parquet` files in its directory and subdirectories will be included.
 
 ```bash
-aces-cli cohort_name="foo" cohort_dir="bar/" data.standard=meds data=sharded data.root="baz/" "data.shard=$(expand_shards qux/#)" -m
+aces-cli cohort_name="foo" cohort_dir="bar/" data.standard=meds data=sharded data.root="baz/" "data.shard=$(expand_shards baz/)" -m
 ```
 
 ### ESGPT