Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference to Module API #59

Merged
merged 5 commits into from
Jun 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/source/terminology.md → docs/source/algorithm.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,8 +114,8 @@ In the rest of this document, we will detail how our algorithm automatically ext
these criteria and the terminology we use to describe our algorithm (both here and in the raw source code and
code comments). There are certain limitations of this algorithm where some kinds of tasks cannot yet be
expressed directly (more information available in the
[FAQs](https://eventstreamaces.readthedocs.io/en/latest/overview.html#faqs) and the
[Future Roadmap](https://eventstreamaces.readthedocs.io/en/latest/overview.html#future-roadmap)). Details
[FAQs](https://eventstreamaces.readthedocs.io/en/latest/readme.html#faqs) and the
[Future Roadmap](https://eventstreamaces.readthedocs.io/en/latest/readme.html#future-roadmap)). Details
about the true configuration language that is used in practice to specify "windows" can be found in
{doc}`/configuration`. Some task examples are available in {doc}`/notebooks/examples`.

Expand Down Expand Up @@ -188,7 +188,7 @@ During initialization, we will be given the following inputs:

##### `cfg`

`cfg` is a `TaskExtractorConfig` object containing our task definition, include all information about
`cfg` is a {py:class}`aces.config.TaskExtractorConfig` object containing our task definition, include all information about
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure the class reference uses the correct Sphinx notation.

- cfg is a {py:class}`aces.config.TaskExtractorConfig`
+ cfg is a :py:class:`aces.config.TaskExtractorConfig`

This correction ensures that the class reference is properly formatted for Sphinx documentation, allowing for correct rendering and linking.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
`cfg` is a {py:class}`aces.config.TaskExtractorConfig` object containing our task definition, include all information about
`cfg` is a :py:class:`aces.config.TaskExtractorConfig` object containing our task definition, include all information about
Tools
LanguageTool

[uncategorized] ~191-~191: This verb may not be in the correct form. Consider using a different form for this context. (AI_EN_LECTOR_REPLACEMENT_VERB_FORM)
Context: ... object containing our task definition, include all information about predicates, the t...

predicates, the trigger event, and windows.

##### `predicates_df`
Expand Down
16 changes: 8 additions & 8 deletions docs/source/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ format (recommended) or the [ESGPT](https://eventstreamml.readthedocs.io/en/late
system works by defining a configuration object that details the underlying concepts, inclusion/exclusion, and
labeling criteria for the cohort/task to be extracted, then using a recursive algorithm to identify all
realizations of valid patient time-ranges of data that satisfy those constraints from the raw data. For more
details on the recursive algorithm, see the `terminology.md` file.
details on the recursive algorithm, see [Algorithm Design](https://eventstreamaces.readthedocs.io/en/latest/technical.html#algorithm-design).

As indicated above, these cohorts are specified through a combination of concepts (realized as event
_predicate_ functions, _aka_ "predicates") which are _dataset specific_ and inclusion/exclusion/labeling
Expand All @@ -28,10 +28,10 @@ ______________________________________________________________________
In the machine form used by ACES, the configuration file consists of three parts:

- `predicates`, stored as a dictionary from string predicate names (which must be unique) to either
`PlainPredicateConfig` objects, which store raw predicates with no dependencies on other predicates, or
`DerivedPredicateConfig` objects, which store predicates that build on other predicates.
{py:class}`aces.config.PlainPredicateConfig` objects, which store raw predicates with no dependencies on other predicates, or
{py:class}`aces.config.DerivedPredicateConfig` objects, which store predicates that build on other predicates.
- `trigger`, stored as a string to `EventConfig`
- `windows`, stored as a dictionary from string window names (which must be unique) to `WindowConfig`
- `windows`, stored as a dictionary from string window names (which must be unique) to {py:class}`aces.config.WindowConfig`
Comment on lines +31 to +34
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update class references to use consistent and correct Sphinx notation.

- {py:class}`aces.config.PlainPredicateConfig`
+ :py:class:`aces.config.PlainPredicateConfig`
- {py:class}`aces.config.DerivedPredicateConfig`
+ :py:class:`aces.config.DerivedPredicateConfig`
- {py:class}`aces.config.WindowConfig`
+ :py:class:`aces.config.WindowConfig`

This change corrects the syntax for referencing Python classes in Sphinx documentation, ensuring that the documentation is properly rendered.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{py:class}`aces.config.PlainPredicateConfig` objects, which store raw predicates with no dependencies on other predicates, or
{py:class}`aces.config.DerivedPredicateConfig` objects, which store predicates that build on other predicates.
- `trigger`, stored as a string to `EventConfig`
- `windows`, stored as a dictionary from string window names (which must be unique) to `WindowConfig`
- `windows`, stored as a dictionary from string window names (which must be unique) to {py:class}`aces.config.WindowConfig`
:py:class:`aces.config.PlainPredicateConfig` objects, which store raw predicates with no dependencies on other predicates, or
:py:class:`aces.config.DerivedPredicateConfig` objects, which store predicates that build on other predicates.
- `trigger`, stored as a string to `EventConfig`
- `windows`, stored as a dictionary from string window names (which must be unique) to :py:class:`aces.config.WindowConfig`
Tools
LanguageTool

[uncategorized] ~33-~33: Loose punctuation mark. (UNLIKELY_OPENING_PUNCTUATION)
Context: ...t build on other predicates. - trigger, stored as a string to EventConfig - `...

objects.

Below, we will detail each of these configuration objects.
Expand All @@ -40,7 +40,7 @@ ______________________________________________________________________

### Predicates: `PlainPredicateConfig` and `DerivedPredicateConfig`

#### `PlainPredicateConfig`: Configuration of Predicates that can be Computed Directly from Raw Data
#### {py:class}`aces.config.PlainPredicateConfig`: Configuration of Predicates that can be Computed Directly from Raw Data

These configs consist of the following four fields:

Expand Down Expand Up @@ -87,7 +87,7 @@ on its source format.
be of the univariate regression type and its value, if needed, will be pulled from the corresponding
column.

#### `DerivedPredicateConfig`: Configuration of Predicates that Depend on Other Predicates
#### {py:class}`aces.config.DerivedPredicateConfig`: Configuration of Predicates that Depend on Other Predicates

These configuration objects consist of only a single string field--`expr`--which contains a limited grammar of
accepted operations that can be applied to other predicates, containing precisely the following:
Expand All @@ -100,7 +100,7 @@ analytic operations over predicates.

______________________________________________________________________

### Events: `EventConfig`
### Events: {py:class}`aces.config.EventConfig`

The event config consists of only a single field, `predicate`, which specifies the predicate that must be
observed with value greater than one to satisfy the event. There can only be one defined "event" with an
Expand All @@ -110,7 +110,7 @@ The value of its field can be any defined predicate.

______________________________________________________________________

### Windows: `WindowConfig`
### Windows: {py:class}`aces.config.WindowConfig`

Windows contain a tracking `name` field, and otherwise are specified with two parts: (1) A set of four
parameters (`start`, `end`, `start_inclusive`, and `end_inclusive`) that specify the time range of the window,
Expand Down
14 changes: 9 additions & 5 deletions docs/source/notebooks/examples.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@
"source": [
"# Task Examples\n",
"\n",
"Provided below are two examples of mortality prediction tasks that ACES could easily extract subject cohorts for. The configurations have been tested all the provided synthetic data in the repository (`../../../sample_data/`), as well as the MIMIC-IV dataset loaded using MEDS & ESGPT (with very minor changes to the below predicate definition). The configuration files for both of these tasks are provided in the repository (`../../../sample_configs`), and cohorts can be extracted using the `aces-cli` tool:\n",
"Provided below are two examples of mortality prediction tasks that ACES could easily extract subject cohorts for. The configurations have been tested all the provided synthetic data in the repository ([`sample_data/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_data)), as well as the MIMIC-IV dataset loaded using MEDS & ESGPT (with very minor changes to the below predicate definition). The configuration files for both of these tasks are provided in the repository ([`sample_configs/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_configs)), and cohorts can be extracted using the `aces-cli` tool:\n",
"\n",
"```bash\n",
"aces-cli data.path='/path/to/MIMIC/ESGPT/schema/' data.standard='esgpt' cohort_dir='../../../sample_configs' cohort_name='...'\n",
"aces-cli data.path='/path/to/MIMIC/ESGPT/schema/' data.standard='esgpt' cohort_dir='sample_configs/' cohort_name='...'\n",
"```"
]
},
Expand Down Expand Up @@ -138,11 +138,15 @@
"\n",
"The windows section contains the remaining three windows we defined previously - `input`, `gap`, and `target`.\n",
"\n",
"`input` begins at the start of a patient's record (ie., `NULL`), and ends 24 hours past `trigger` (ie., `admission`). As we'd like to include the events specified at both the start and end of `input`, if present, we can set both `start_inclusive` and `end_inclusive` as `True`. Our constraint on the number of records is specified in `has` using the `_ANY_EVENT` predicate, with its value set to be greater or equal to 5 (ie., unbounded parameter on the right as seen in `(5, None)`). **Note**: Since we'd like to make a prediction at the end of `input`, we can set `index_timestamp` to be `end`, which corresponds to the timestamp of `trigger + 24h`.\n",
"`input` begins at the start of a patient's record (ie., `NULL`), and ends 24 hours past `trigger` (ie., `admission`). As we'd like to include the events specified at both the start and end of `input`, if present, we can set both `start_inclusive` and `end_inclusive` as `True`. Our constraint on the number of records is specified in `has` using the `_ANY_EVENT` predicate, with its value set to be greater or equal to 5 (ie., unbounded parameter on the right as seen in `(5, None)`). \n",
"\n",
"**Note**: Since we'd like to make a prediction at the end of `input`, we can set `index_timestamp` to be `end`, which corresponds to the timestamp of `trigger + 24h`.\n",
"\n",
"`gap` also begins at `trigger`, and ends 48 hours after. As we have included included the left boundary event in `trigger` (ie., `admission`), it would be reasonable to not include it again as it should not play a role in `gap`. As such, we set `start_inclusive` to `False`. As we'd like our admission to be at least 48 hours long, we can place constraints specifying that there cannot be any `admission`, `discharge`, or `death` in `gap` (ie., right-bounded parameter at `0` as seen in `(None, 0)`).\n",
"\n",
"`target` beings at the end of `gap`, and ends at the next discharge or death event (ie., `discharge_or_death` predicate). We can use this arrow notation which ACES recognizes as event references (ie., `->` and `<-`; see [Time Range Fields](https://eventstreamaces--39.org.readthedocs.build/en/39/configuration.html#time-range-fields)). In our case, we end `target` at the next `discharge_or_death`. Similarly, as we included the event at the end of `gap`, if any, already in `gap`, we can set `start_inclusive` to `False`. **Note**: Since we'd like to make a binary mortality prediction, we can extract the `death` predicate as a label from `target`, by specifying the `label` field to be `death`."
"`target` beings at the end of `gap`, and ends at the next discharge or death event (ie., `discharge_or_death` predicate). We can use this arrow notation which ACES recognizes as event references (ie., `->` and `<-`; see [Time Range Fields](https://eventstreamaces.readthedocs.io/en/latest/technical.html#time-range-fields)). In our case, we end `target` at the next `discharge_or_death`. Similarly, as we included the event at the end of `gap`, if any, already in `gap`, we can set `start_inclusive` to `False`. \n",
"\n",
"**Note**: Since we'd like to make a binary mortality prediction, we can extract the `death` predicate as a label from `target`, by specifying the `label` field to be `death`."
]
},
{
Expand Down Expand Up @@ -269,7 +273,7 @@
"source": [
"## Other Examples\n",
"\n",
"A few other examples are provided in `../../../sample_configs/` of the repository. We will continue to add task configurations to this folder or to a benchmarking effort for EHR representation learning. More information can be found [here](https://github.com/mmcdermott/PIE_MD/tree/main) - stay tuned!"
"A few other examples are provided in [`sample_configs/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_configs) of the repository. We will continue to add task configurations to this folder or to a benchmarking effort for EHR representation learning. More information can be found [here](https://github.com/mmcdermott/PIE_MD/tree/main) - stay tuned!"
]
}
],
Expand Down
2 changes: 1 addition & 1 deletion docs/source/notebooks/predicates.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
"source": [
"## Sample Predicates DataFrame\n",
"\n",
"A sample predicates dataframe is provided in the repository (`../../../sample_data/sample_data.csv`). This dataframe holds completely synthetic data and was designed such that the accompanying sample configuration files in the repository (`../../../sample_configs`) could be directly extracted."
"A sample predicates dataframe is provided in the repository ([`sample_data/sample_data.csv`](https://github.com/justin13601/ACES/blob/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_data/sample_data.csv)). This dataframe holds completely synthetic data and was designed such that the accompanying sample configuration files in the repository ([`sample_configs/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_configs)) could be directly extracted."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/source/notebooks/tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
"source": [
"### Directories\n",
"\n",
"Next, let's specify our paths and directories. In this tutorial, we will extract a cohort for a typical in-hospital mortality prediction task from the ESGPT synthetic sample dataset. The task configuration file and sample data are both shipped with the repository in `sample_configs` and `sample_data` folders in the project root, respectively."
"Next, let's specify our paths and directories. In this tutorial, we will extract a cohort for a typical in-hospital mortality prediction task from the ESGPT synthetic sample dataset. The task configuration file and sample data are both shipped with the repository in [`sample_configs/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_configs) and [`sample_data/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_data) folders in the project root, respectively."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/source/technical.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@
```{include} configuration.md
```

```{include} terminology.md
```{include} algorithm.md
```
8 changes: 5 additions & 3 deletions docs/source/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,15 +175,17 @@ To query from a direct predicates dataframe:

#### Task Configuration

`cohort_dir`: Directory the your task configuration file
`cohort_dir`: Directory of your task configuration file

`cohort_name`: Name of the task configuration file

The above two fields are used for automatically loading task configurations, saving results, and logging:

`config_path`: Path to the task configuration file. Defaults to `${cohort_dir}/${cohort_name}.yaml`

`output_filepath`: Path to store the outputs. Defaults to `${cohort_dir}/${cohort_name}/${data.shard}.parquet` for MEDS with multiple shards, and `${cohort_dir}/${cohort_name}.parquet` otherwise.
`output_filepath`: Path to store the outputs. Defaults to `${cohort_dir}/${cohort_name}/${data.shard}.parquet` for MEDS with multiple shards, and `${cohort_dir}/${cohort_name}.parquet` otherwise

`log_dir`: Path to store logs. Defaults to `${cohort_dir}/${cohort_name}/.logs`

#### Tab Completion

Expand Down Expand Up @@ -237,7 +239,7 @@ You can also use the `aces.query.query()` function to extract a cohort in Python
.. autofunction:: aces.query.query
```

The `cfg` parameter must be of type `config.TaskExtractorConfig`, and the `predicates_df` parameter must be of type `polars.DataFrame`.
The `cfg` parameter must be of type {py:class}`aces.config.TaskExtractorConfig`, and the `predicates_df` parameter must be of type `polars.DataFrame`.

Details about the configuration language used to define the `cfg` parameter can be found in {doc}`/configuration`.

Expand Down
Loading