Skip to content

Commit

Permalink
Reference to Module API (#59)
Browse files Browse the repository at this point in the history
* Attempt reference to module api?

* Using mystparser

* Add reference to module api

* More reference and fixes

* Undo reference in notebooks, fix links
  • Loading branch information
justin13601 authored Jun 13, 2024
1 parent 5cf0261 commit c0713cb
Show file tree
Hide file tree
Showing 7 changed files with 28 additions and 22 deletions.
6 changes: 3 additions & 3 deletions docs/source/terminology.md → docs/source/algorithm.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,8 +114,8 @@ In the rest of this document, we will detail how our algorithm automatically ext
these criteria and the terminology we use to describe our algorithm (both here and in the raw source code and
code comments). There are certain limitations of this algorithm where some kinds of tasks cannot yet be
expressed directly (more information available in the
[FAQs](https://eventstreamaces.readthedocs.io/en/latest/overview.html#faqs) and the
[Future Roadmap](https://eventstreamaces.readthedocs.io/en/latest/overview.html#future-roadmap)). Details
[FAQs](https://eventstreamaces.readthedocs.io/en/latest/readme.html#faqs) and the
[Future Roadmap](https://eventstreamaces.readthedocs.io/en/latest/readme.html#future-roadmap)). Details
about the true configuration language that is used in practice to specify "windows" can be found in
{doc}`/configuration`. Some task examples are available in {doc}`/notebooks/examples`.

Expand Down Expand Up @@ -188,7 +188,7 @@ During initialization, we will be given the following inputs:

##### `cfg`

`cfg` is a `TaskExtractorConfig` object containing our task definition, include all information about
`cfg` is a {py:class}`aces.config.TaskExtractorConfig` object containing our task definition, include all information about
predicates, the trigger event, and windows.

##### `predicates_df`
Expand Down
16 changes: 8 additions & 8 deletions docs/source/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ format (recommended) or the [ESGPT](https://eventstreamml.readthedocs.io/en/late
system works by defining a configuration object that details the underlying concepts, inclusion/exclusion, and
labeling criteria for the cohort/task to be extracted, then using a recursive algorithm to identify all
realizations of valid patient time-ranges of data that satisfy those constraints from the raw data. For more
details on the recursive algorithm, see the `terminology.md` file.
details on the recursive algorithm, see [Algorithm Design](https://eventstreamaces.readthedocs.io/en/latest/technical.html#algorithm-design).

As indicated above, these cohorts are specified through a combination of concepts (realized as event
_predicate_ functions, _aka_ "predicates") which are _dataset specific_ and inclusion/exclusion/labeling
Expand All @@ -28,10 +28,10 @@ ______________________________________________________________________
In the machine form used by ACES, the configuration file consists of three parts:

- `predicates`, stored as a dictionary from string predicate names (which must be unique) to either
`PlainPredicateConfig` objects, which store raw predicates with no dependencies on other predicates, or
`DerivedPredicateConfig` objects, which store predicates that build on other predicates.
{py:class}`aces.config.PlainPredicateConfig` objects, which store raw predicates with no dependencies on other predicates, or
{py:class}`aces.config.DerivedPredicateConfig` objects, which store predicates that build on other predicates.
- `trigger`, stored as a string to `EventConfig`
- `windows`, stored as a dictionary from string window names (which must be unique) to `WindowConfig`
- `windows`, stored as a dictionary from string window names (which must be unique) to {py:class}`aces.config.WindowConfig`
objects.

Below, we will detail each of these configuration objects.
Expand All @@ -40,7 +40,7 @@ ______________________________________________________________________

### Predicates: `PlainPredicateConfig` and `DerivedPredicateConfig`

#### `PlainPredicateConfig`: Configuration of Predicates that can be Computed Directly from Raw Data
#### {py:class}`aces.config.PlainPredicateConfig`: Configuration of Predicates that can be Computed Directly from Raw Data

These configs consist of the following four fields:

Expand Down Expand Up @@ -87,7 +87,7 @@ on its source format.
be of the univariate regression type and its value, if needed, will be pulled from the corresponding
column.

#### `DerivedPredicateConfig`: Configuration of Predicates that Depend on Other Predicates
#### {py:class}`aces.config.DerivedPredicateConfig`: Configuration of Predicates that Depend on Other Predicates

These configuration objects consist of only a single string field--`expr`--which contains a limited grammar of
accepted operations that can be applied to other predicates, containing precisely the following:
Expand All @@ -100,7 +100,7 @@ analytic operations over predicates.

______________________________________________________________________

### Events: `EventConfig`
### Events: {py:class}`aces.config.EventConfig`

The event config consists of only a single field, `predicate`, which specifies the predicate that must be
observed with value greater than one to satisfy the event. There can only be one defined "event" with an
Expand All @@ -110,7 +110,7 @@ The value of its field can be any defined predicate.

______________________________________________________________________

### Windows: `WindowConfig`
### Windows: {py:class}`aces.config.WindowConfig`

Windows contain a tracking `name` field, and otherwise are specified with two parts: (1) A set of four
parameters (`start`, `end`, `start_inclusive`, and `end_inclusive`) that specify the time range of the window,
Expand Down
14 changes: 9 additions & 5 deletions docs/source/notebooks/examples.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@
"source": [
"# Task Examples\n",
"\n",
"Provided below are two examples of mortality prediction tasks that ACES could easily extract subject cohorts for. The configurations have been tested all the provided synthetic data in the repository (`../../../sample_data/`), as well as the MIMIC-IV dataset loaded using MEDS & ESGPT (with very minor changes to the below predicate definition). The configuration files for both of these tasks are provided in the repository (`../../../sample_configs`), and cohorts can be extracted using the `aces-cli` tool:\n",
"Provided below are two examples of mortality prediction tasks that ACES could easily extract subject cohorts for. The configurations have been tested all the provided synthetic data in the repository ([`sample_data/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_data)), as well as the MIMIC-IV dataset loaded using MEDS & ESGPT (with very minor changes to the below predicate definition). The configuration files for both of these tasks are provided in the repository ([`sample_configs/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_configs)), and cohorts can be extracted using the `aces-cli` tool:\n",
"\n",
"```bash\n",
"aces-cli data.path='/path/to/MIMIC/ESGPT/schema/' data.standard='esgpt' cohort_dir='../../../sample_configs' cohort_name='...'\n",
"aces-cli data.path='/path/to/MIMIC/ESGPT/schema/' data.standard='esgpt' cohort_dir='sample_configs/' cohort_name='...'\n",
"```"
]
},
Expand Down Expand Up @@ -138,11 +138,15 @@
"\n",
"The windows section contains the remaining three windows we defined previously - `input`, `gap`, and `target`.\n",
"\n",
"`input` begins at the start of a patient's record (ie., `NULL`), and ends 24 hours past `trigger` (ie., `admission`). As we'd like to include the events specified at both the start and end of `input`, if present, we can set both `start_inclusive` and `end_inclusive` as `True`. Our constraint on the number of records is specified in `has` using the `_ANY_EVENT` predicate, with its value set to be greater or equal to 5 (ie., unbounded parameter on the right as seen in `(5, None)`). **Note**: Since we'd like to make a prediction at the end of `input`, we can set `index_timestamp` to be `end`, which corresponds to the timestamp of `trigger + 24h`.\n",
"`input` begins at the start of a patient's record (ie., `NULL`), and ends 24 hours past `trigger` (ie., `admission`). As we'd like to include the events specified at both the start and end of `input`, if present, we can set both `start_inclusive` and `end_inclusive` as `True`. Our constraint on the number of records is specified in `has` using the `_ANY_EVENT` predicate, with its value set to be greater or equal to 5 (ie., unbounded parameter on the right as seen in `(5, None)`). \n",
"\n",
"**Note**: Since we'd like to make a prediction at the end of `input`, we can set `index_timestamp` to be `end`, which corresponds to the timestamp of `trigger + 24h`.\n",
"\n",
"`gap` also begins at `trigger`, and ends 48 hours after. As we have included included the left boundary event in `trigger` (ie., `admission`), it would be reasonable to not include it again as it should not play a role in `gap`. As such, we set `start_inclusive` to `False`. As we'd like our admission to be at least 48 hours long, we can place constraints specifying that there cannot be any `admission`, `discharge`, or `death` in `gap` (ie., right-bounded parameter at `0` as seen in `(None, 0)`).\n",
"\n",
"`target` beings at the end of `gap`, and ends at the next discharge or death event (ie., `discharge_or_death` predicate). We can use this arrow notation which ACES recognizes as event references (ie., `->` and `<-`; see [Time Range Fields](https://eventstreamaces--39.org.readthedocs.build/en/39/configuration.html#time-range-fields)). In our case, we end `target` at the next `discharge_or_death`. Similarly, as we included the event at the end of `gap`, if any, already in `gap`, we can set `start_inclusive` to `False`. **Note**: Since we'd like to make a binary mortality prediction, we can extract the `death` predicate as a label from `target`, by specifying the `label` field to be `death`."
"`target` beings at the end of `gap`, and ends at the next discharge or death event (ie., `discharge_or_death` predicate). We can use this arrow notation which ACES recognizes as event references (ie., `->` and `<-`; see [Time Range Fields](https://eventstreamaces.readthedocs.io/en/latest/technical.html#time-range-fields)). In our case, we end `target` at the next `discharge_or_death`. Similarly, as we included the event at the end of `gap`, if any, already in `gap`, we can set `start_inclusive` to `False`. \n",
"\n",
"**Note**: Since we'd like to make a binary mortality prediction, we can extract the `death` predicate as a label from `target`, by specifying the `label` field to be `death`."
]
},
{
Expand Down Expand Up @@ -269,7 +273,7 @@
"source": [
"## Other Examples\n",
"\n",
"A few other examples are provided in `../../../sample_configs/` of the repository. We will continue to add task configurations to this folder or to a benchmarking effort for EHR representation learning. More information can be found [here](https://github.com/mmcdermott/PIE_MD/tree/main) - stay tuned!"
"A few other examples are provided in [`sample_configs/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_configs) of the repository. We will continue to add task configurations to this folder or to a benchmarking effort for EHR representation learning. More information can be found [here](https://github.com/mmcdermott/PIE_MD/tree/main) - stay tuned!"
]
}
],
Expand Down
2 changes: 1 addition & 1 deletion docs/source/notebooks/predicates.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
"source": [
"## Sample Predicates DataFrame\n",
"\n",
"A sample predicates dataframe is provided in the repository (`../../../sample_data/sample_data.csv`). This dataframe holds completely synthetic data and was designed such that the accompanying sample configuration files in the repository (`../../../sample_configs`) could be directly extracted."
"A sample predicates dataframe is provided in the repository ([`sample_data/sample_data.csv`](https://github.com/justin13601/ACES/blob/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_data/sample_data.csv)). This dataframe holds completely synthetic data and was designed such that the accompanying sample configuration files in the repository ([`sample_configs/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_configs)) could be directly extracted."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/source/notebooks/tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
"source": [
"### Directories\n",
"\n",
"Next, let's specify our paths and directories. In this tutorial, we will extract a cohort for a typical in-hospital mortality prediction task from the ESGPT synthetic sample dataset. The task configuration file and sample data are both shipped with the repository in `sample_configs` and `sample_data` folders in the project root, respectively."
"Next, let's specify our paths and directories. In this tutorial, we will extract a cohort for a typical in-hospital mortality prediction task from the ESGPT synthetic sample dataset. The task configuration file and sample data are both shipped with the repository in [`sample_configs/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_configs) and [`sample_data/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_data) folders in the project root, respectively."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/source/technical.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@
```{include} configuration.md
```

```{include} terminology.md
```{include} algorithm.md
```
8 changes: 5 additions & 3 deletions docs/source/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,15 +175,17 @@ To query from a direct predicates dataframe:

#### Task Configuration

`cohort_dir`: Directory the your task configuration file
`cohort_dir`: Directory of your task configuration file

`cohort_name`: Name of the task configuration file

The above two fields are used for automatically loading task configurations, saving results, and logging:

`config_path`: Path to the task configuration file. Defaults to `${cohort_dir}/${cohort_name}.yaml`

`output_filepath`: Path to store the outputs. Defaults to `${cohort_dir}/${cohort_name}/${data.shard}.parquet` for MEDS with multiple shards, and `${cohort_dir}/${cohort_name}.parquet` otherwise.
`output_filepath`: Path to store the outputs. Defaults to `${cohort_dir}/${cohort_name}/${data.shard}.parquet` for MEDS with multiple shards, and `${cohort_dir}/${cohort_name}.parquet` otherwise

`log_dir`: Path to store logs. Defaults to `${cohort_dir}/${cohort_name}/.logs`

#### Tab Completion

Expand Down Expand Up @@ -237,7 +239,7 @@ You can also use the `aces.query.query()` function to extract a cohort in Python
.. autofunction:: aces.query.query
```

The `cfg` parameter must be of type `config.TaskExtractorConfig`, and the `predicates_df` parameter must be of type `polars.DataFrame`.
The `cfg` parameter must be of type {py:class}`aces.config.TaskExtractorConfig`, and the `predicates_df` parameter must be of type `polars.DataFrame`.

Details about the configuration language used to define the `cfg` parameter can be found in {doc}`/configuration`.

Expand Down

0 comments on commit c0713cb

Please sign in to comment.