Skip to content

Commit

Permalink
More reference and fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
justin13601 committed Jun 13, 2024
1 parent da12084 commit 457ffcf
Show file tree
Hide file tree
Showing 4 changed files with 31 additions and 16 deletions.
14 changes: 9 additions & 5 deletions docs/source/notebooks/examples.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@
"source": [
"# Task Examples\n",
"\n",
"Provided below are two examples of mortality prediction tasks that ACES could easily extract subject cohorts for. The configurations have been tested all the provided synthetic data in the repository (`../../../sample_data/`), as well as the MIMIC-IV dataset loaded using MEDS & ESGPT (with very minor changes to the below predicate definition). The configuration files for both of these tasks are provided in the repository (`../../../sample_configs`), and cohorts can be extracted using the `aces-cli` tool:\n",
"Provided below are two examples of mortality prediction tasks that ACES could easily extract subject cohorts for. The configurations have been tested all the provided synthetic data in the repository ([`sample_data/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_data)), as well as the MIMIC-IV dataset loaded using MEDS & ESGPT (with very minor changes to the below predicate definition). The configuration files for both of these tasks are provided in the repository ([`sample_configs/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_configs)), and cohorts can be extracted using the `aces-cli` tool:\n",
"\n",
"```bash\n",
"aces-cli data.path='/path/to/MIMIC/ESGPT/schema/' data.standard='esgpt' cohort_dir='../../../sample_configs' cohort_name='...'\n",
"aces-cli data.path='/path/to/MIMIC/ESGPT/schema/' data.standard='esgpt' cohort_dir='sample_configs/' cohort_name='...'\n",
"```"
]
},
Expand Down Expand Up @@ -138,11 +138,15 @@
"\n",
"The windows section contains the remaining three windows we defined previously - `input`, `gap`, and `target`.\n",
"\n",
"`input` begins at the start of a patient's record (ie., `NULL`), and ends 24 hours past `trigger` (ie., `admission`). As we'd like to include the events specified at both the start and end of `input`, if present, we can set both `start_inclusive` and `end_inclusive` as `True`. Our constraint on the number of records is specified in `has` using the `_ANY_EVENT` predicate, with its value set to be greater or equal to 5 (ie., unbounded parameter on the right as seen in `(5, None)`). **Note**: Since we'd like to make a prediction at the end of `input`, we can set `index_timestamp` to be `end`, which corresponds to the timestamp of `trigger + 24h`.\n",
"`input` begins at the start of a patient's record (ie., `NULL`), and ends 24 hours past `trigger` (ie., `admission`). As we'd like to include the events specified at both the start and end of `input`, if present, we can set both `start_inclusive` and `end_inclusive` as `True`. Our constraint on the number of records is specified in `has` using the `_ANY_EVENT` predicate, with its value set to be greater or equal to 5 (ie., unbounded parameter on the right as seen in `(5, None)`). \n",
"\n",
"**Note**: Since we'd like to make a prediction at the end of `input`, we can set `index_timestamp` to be `end`, which corresponds to the timestamp of `trigger + 24h`.\n",
"\n",
"`gap` also begins at `trigger`, and ends 48 hours after. As we have included included the left boundary event in `trigger` (ie., `admission`), it would be reasonable to not include it again as it should not play a role in `gap`. As such, we set `start_inclusive` to `False`. As we'd like our admission to be at least 48 hours long, we can place constraints specifying that there cannot be any `admission`, `discharge`, or `death` in `gap` (ie., right-bounded parameter at `0` as seen in `(None, 0)`).\n",
"\n",
"`target` beings at the end of `gap`, and ends at the next discharge or death event (ie., `discharge_or_death` predicate). We can use this arrow notation which ACES recognizes as event references (ie., `->` and `<-`; see [Time Range Fields](https://eventstreamaces--39.org.readthedocs.build/en/39/configuration.html#time-range-fields)). In our case, we end `target` at the next `discharge_or_death`. Similarly, as we included the event at the end of `gap`, if any, already in `gap`, we can set `start_inclusive` to `False`. **Note**: Since we'd like to make a binary mortality prediction, we can extract the `death` predicate as a label from `target`, by specifying the `label` field to be `death`."
"`target` beings at the end of `gap`, and ends at the next discharge or death event (ie., `discharge_or_death` predicate). We can use this arrow notation which ACES recognizes as event references (ie., `->` and `<-`; see [Time Range Fields](https://eventstreamaces.readthedocs.io/en/latest/technical.html#time-range-fields)). In our case, we end `target` at the next `discharge_or_death`. Similarly, as we included the event at the end of `gap`, if any, already in `gap`, we can set `start_inclusive` to `False`. \n",
"\n",
"**Note**: Since we'd like to make a binary mortality prediction, we can extract the `death` predicate as a label from `target`, by specifying the `label` field to be `death`."
]
},
{
Expand Down Expand Up @@ -269,7 +273,7 @@
"source": [
"## Other Examples\n",
"\n",
"A few other examples are provided in `../../../sample_configs/` of the repository. We will continue to add task configurations to this folder or to a benchmarking effort for EHR representation learning. More information can be found [here](https://github.com/mmcdermott/PIE_MD/tree/main) - stay tuned!"
"A few other examples are provided in [`sample_configs/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_configs) of the repository. We will continue to add task configurations to this folder or to a benchmarking effort for EHR representation learning. More information can be found [here](https://github.com/mmcdermott/PIE_MD/tree/main) - stay tuned!"
]
}
],
Expand Down
2 changes: 1 addition & 1 deletion docs/source/notebooks/predicates.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
"source": [
"## Sample Predicates DataFrame\n",
"\n",
"A sample predicates dataframe is provided in the repository (`../../../sample_data/sample_data.csv`). This dataframe holds completely synthetic data and was designed such that the accompanying sample configuration files in the repository (`../../../sample_configs`) could be directly extracted."
"A sample predicates dataframe is provided in the repository ([`sample_data/sample_data.csv`](https://github.com/justin13601/ACES/blob/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_data/sample_data.csv)). This dataframe holds completely synthetic data and was designed such that the accompanying sample configuration files in the repository ([`sample_configs/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_configs)) could be directly extracted."
]
},
{
Expand Down
4 changes: 2 additions & 2 deletions docs/source/notebooks/tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
"source": [
"### Directories\n",
"\n",
"Next, let's specify our paths and directories. In this tutorial, we will extract a cohort for a typical in-hospital mortality prediction task from the ESGPT synthetic sample dataset. The task configuration file and sample data are both shipped with the repository in `sample_configs` and `sample_data` folders in the project root, respectively."
"Next, let's specify our paths and directories. In this tutorial, we will extract a cohort for a typical in-hospital mortality prediction task from the ESGPT synthetic sample dataset. The task configuration file and sample data are both shipped with the repository in [`sample_configs/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_configs) and [`sample_data/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_data) folders in the project root, respectively."
]
},
{
Expand Down Expand Up @@ -102,7 +102,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We now load our configuration file by passing its path (`str`) into `config.TaskExtractorConfig.load()`. This parses the configuration file for each of the three key sections indicated above and prepares ACES for extraction based on our defined constraints (inclusion/exclusion criteria for each window)."
"We now load our configuration file by passing its path (`str`) into {py:func}`aces.config.TaskExtractorConfig.load()`. This parses the configuration file for each of the three key sections indicated above and prepares ACES for extraction based on our defined constraints (inclusion/exclusion criteria for each window)."
]
},
{
Expand Down
27 changes: 19 additions & 8 deletions docs/source/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,13 +67,22 @@ windows:

You can now run `aces-cli` in your terminal. Suppose we have a directory structure like the following:

```yaml
ACES/ ├── sample_data/ │ ├── esgpt_sample/ │ │ ├── ... │ │ ├── events_df.parquet
│ │ └── dynamic_measurements_df.parquet │ ├── meds_sample/ │ │ ├── shards/
│ │ │ ├── 0.parquet │ │ │ └── 1.parquet │ │ └── sample_shard.parquet
│ └── sample_data.csv ├── sample_configs/ │ └── inhospital_mortality.yaml └──
...
...
```
ACES/
├── sample_data/
│ ├── esgpt_sample/
│ │ ├── ...
│ │ ├── events_df.parquet
│ │ └── dynamic_measurements_df.parquet
│ ├── meds_sample/
│ │ ├── shards/
│ │ │ ├── 0.parquet
│ │ │ └── 1.parquet
│ │ └── sample_shard.parquet
│ └── sample_data.csv
├── sample_configs/
│ └── inhospital_mortality.yaml
└── ...
```

**To query from a single MEDS shard**:
Expand Down Expand Up @@ -174,7 +183,9 @@ The above two fields are used for automatically loading task configurations, sav

`config_path`: Path to the task configuration file. Defaults to `${cohort_dir}/${cohort_name}.yaml`

`output_filepath`: Path to store the outputs. Defaults to `${cohort_dir}/${cohort_name}/${data.shard}.parquet` for MEDS with multiple shards, and `${cohort_dir}/${cohort_name}.parquet` otherwise.
`output_filepath`: Path to store the outputs. Defaults to `${cohort_dir}/${cohort_name}/${data.shard}.parquet` for MEDS with multiple shards, and `${cohort_dir}/${cohort_name}.parquet` otherwise

`log_dir`: Path to store logs. Defaults to `${cohort_dir}/${cohort_name}/.logs`

#### Tab Completion

Expand Down

0 comments on commit 457ffcf

Please sign in to comment.