Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation Changes #57

Merged
merged 14 commits into from
Jun 13, 2024
14 changes: 13 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -300,9 +300,21 @@ The `has` field specifies constraints relating to predicates within the window.

Support for static data depends on your data standard and those variables are expressed. For instance, in MEDS, it is feasible to express static data as a predicate, and thus criteria can be set normally. However, this is not yet incorporated for ESGPT. If a predicates dataframe is directly used, you may create a predicate column that specifies your static variable.

### Complementary Tools

ACES is an integral part of the MEDS ecosystem. To fully leverage its capabilities, you can utilize it alongside other complementary MEDS tools, such as:

- [MEDS-ETL](https://github.com/Medical-Event-Data-Standard/meds_etl), which can be used to transform various data schemas, including some command data models, into the MEDS format.
- [MEDS-TAB](https://github.com/Medical-Event-Data-Standard/meds_etl), which can be used generate automated tabular baseline methods (ie., XGBoost over ACES-defined tasks).
- [MEDS-Polars](https://github.com/Medical-Event-Data-Standard/meds_etl), which contains polars-based ETL scripts.

### Alternative Tools

TODO
There are existing alternatives for cohort extraction that focus on specific common data models, such as [i2b2 PIC-SURE](https://pic-sure.org/) and [OHDSI ATLAS](https://atlas.ohdsi.org/).

ACES serves as a middle ground between PIC-SURE and ATLAS. While it may offer less capability than PIC-SURE, it compensates with greater ease of use and improved communication value. Compared to ATLAS, ACES provides greater capability, though with slightly lower ease of use, yet it still maintains a higher communication value.

Finally, ACES is not tied to a particular common data model. Built on a flexible event-stream format, ACES is a no-code solution with a descriptive input format, permitting easy and wide iteration over task definitions, and can be applied to a variety of schemas, making it a versatile tool suitable for diverse research needs.

## Future Roadmap

Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -321,7 +321,7 @@ def ensure_pandoc_installed(_):


# -- Options for LaTeX output

# latex_engine = "xelatex"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing the commented-out preamble settings if they are no longer needed to clean up the configuration file.

-    "preamble": "\n".join(
-        [
-            r"\usepackage{svg}",
-            ...
-            r"\DeclareUnicodeCharacter{2559}{+}",
-        ]
-    ),

Committable suggestion was skipped due to low confidence.

latex_elements = { # type: ignore
# The paper size ("letterpaper" or "a4paper").
"papersize": "letterpaper",
Expand Down
6 changes: 1 addition & 5 deletions docs/source/configuration.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
# Configuration Language Specification

## Introduction and Terminology
## Configuration Language Specification

This document specifies the configuration language for the automatic extraction of task dataframes and cohorts
from structured EHR data organized either via the [MEDS](https://github.com/Medical-Event-Data-Standard/meds)
Expand All @@ -27,8 +25,6 @@ contain events that satisfy certain aggregation functions over predicates for th

______________________________________________________________________

## Machine Form (ACES)

In the machine form, the configuration file consists of three parts:

- `predicates`, stored as a dictionary from string predicate names (which must be unique) to either
Expand Down
19 changes: 9 additions & 10 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,10 @@ maxdepth: 2
GitHub README <readme>
Usage Guide <usage>
Task Examples <notebooks/examples>
Sample Data Tutorial <notebooks/tutorial>
Predicates DataFrame <notebooks/predicates>
Configuration Language <configuration>
Algorithm & Terminology <terminology>
Profiling <profiling>
Sample Data Tutorial <notebooks/tutorial>
Technical Details <technical>
Computational Profile <profiling>
Module API Reference <api/modules>
License <license>
```
Expand All @@ -29,29 +28,29 @@ ______________________________________________________________________

If you have a dataset and want to leverage it for machine learning tasks, the ACES ecosystem offers a streamlined and user-friendly approach. Here's how you can easily transform, prepare, and utilize your dataset with MEDS and ACES for efficient and effective machine learning:

### 1. Transform to MEDS
### I. Transform to MEDS

- Simplicity: Converting your dataset to the Medical Event Data Standard (MEDS) is straightforward and user-friendly compared to other Common Data Models (CDMs).
- Minimal Bias: This conversion process ensures that your data remains as close to its raw form as possible, minimizing the introduction of biases.
- [MEDS-ETL](https://github.com/Medical-Event-Data-Standard/meds_etl): Follow this link for detailed instructions and ETLs to transform your dataset into the MEDS format!

### 2. Identify Predicates
### II. Identify Predicates

- Task-Specific Concepts: Identify the predicates (data concepts) required for your specific machine learning tasks.
- Pre-Defined Criteria: Utilize our pre-defined criteria across various tasks and clinical areas to expedite this process.
- [PIE-MD](https://github.com/mmcdermott/PIE_MD/tree/main/tasks/criteria): Access our repository of tasks to find relevant predicates!

### 3. Set Dataset-Agnostic Criteria
### III. Set Dataset-Agnostic Criteria

- Standardization: Combine the identified predicates with standardized, dataset-agnostic criteria files.
- Examples: Refer to the [MIMIC-IV](https://github.com/mmcdermott/PIE_MD/tree/main/tasks/MIMIC-IV) and [eICU](https://github.com/mmcdermott/PIE_MD/tree/main/tasks/eICU) examples for guidance on how to structure your criteria files for your private datasets!

### 4. Run ACES
### IV. Run ACES

- Run the ACES Command-Line Interface tool (`aces-cli`) to extract cohorts based on your task - check out the [Usage Guide](https://eventstreamaces.readthedocs.io/en/latest/usage.html)!

### 5. Run MEDS-Tab
### V. Run MEDS-Tab

- Painless Reproducibility: Use [MEDS-Tab](https://github.com/mmcdermott/MEDS_TAB_MIMIC_IV/tree/main/tasks) to obtain comparable, reproducible, and well-tuned XGBoost results tailored to your dataset-specific feature space!

By following these steps, you can seamlessly transform your dataset, define necessary criteria, and leverage powerful machine learning tools within the ACES ecosystem. This approach not only simplifies the process but also ensures high-quality, reproducible results for your machine learning for health projects. It can reliably take no more than a week of full-time human effort to perform steps 1-5 on new datasets in reasonable raw formulations!
By following these steps, you can seamlessly transform your dataset, define necessary criteria, and leverage powerful machine learning tools within the ACES ecosystem. This approach not only simplifies the process but also ensures high-quality, reproducible results for your machine learning for health projects. It can reliably take no more than a week of full-time human effort to perform Steps I-V on new datasets in reasonable raw formulations!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider simplifying the sentence for better readability.

- This approach not only simplifies the process but also ensures high-quality, reproducible results for your machine learning for health projects. It can reliably take no more than a week of full-time human effort to perform Steps I-V on new datasets in reasonable raw formulations!
+ This approach simplifies the process and ensures high-quality, reproducible results. Typically, Steps I-V can be completed within a week of full-time effort on new datasets.
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
By following these steps, you can seamlessly transform your dataset, define necessary criteria, and leverage powerful machine learning tools within the ACES ecosystem. This approach not only simplifies the process but also ensures high-quality, reproducible results for your machine learning for health projects. It can reliably take no more than a week of full-time human effort to perform Steps I-V on new datasets in reasonable raw formulations!
By following these steps, you can seamlessly transform your dataset, define necessary criteria, and leverage powerful machine learning tools within the ACES ecosystem. This approach simplifies the process and ensures high-quality, reproducible results. Typically, Steps I-V can be completed within a week of full-time effort on new datasets.
Tools
LanguageTool

[style] ~56-~56: Opting for a less wordy alternative here can improve the clarity of your writing. (NOT_ONLY_ALSO)
Context: ...ithin the ACES ecosystem. This approach not only simplifies the process but also ensures high-quality, reproducible results for ...


[style] ~56-~56: Using many exclamation marks might seem excessive (in this case: 8 exclamation marks for a text that’s 2719 characters long) (EN_EXCESSIVE_EXCLAMATION)
Context: ... datasets in reasonable raw formulations!

2 changes: 0 additions & 2 deletions docs/source/license.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,3 @@
language: text
---
```

______________________________________________________________________
27 changes: 13 additions & 14 deletions docs/source/notebooks/predicates.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -26,19 +26,19 @@
"\n",
"| subject_id | timestamp | code | value |\n",
"|------------|---------------------|-------------------------|-------------------------|\n",
"| 1 | 1989-01-01 00:00:00 | ADMISSION | |\n",
"| 1 | 1989-01-01 00:00:00 | ADMISSION | null |\n",
"| 1 | 1989-01-01 01:00:00 | LAB//HR | 90 |\n",
"| 1 | 1989-01-01 01:00:00 | PROCEDURE_START | |\n",
"| 1 | 1989-01-01 02:00:00 | DISCHARGE | |\n",
"| 1 | 1989-01-01 02:00:00 | PROCEDURE_END | |\n",
"| 2 | 1991-05-06 12:00:00 | ADMISSION | |\n",
"| 2 | 1991-05-06 20:00:00 | DEATH | |\n",
"| 3 | 1980-10-17 22:00:00 | ADMISSION | |\n",
"| 1 | 1989-01-01 01:00:00 | PROCEDURE_START | null |\n",
"| 1 | 1989-01-01 02:00:00 | DISCHARGE | null |\n",
"| 1 | 1989-01-01 02:00:00 | PROCEDURE_END | null |\n",
"| 2 | 1991-05-06 12:00:00 | ADMISSION | null |\n",
"| 2 | 1991-05-06 20:00:00 | DEATH | null |\n",
"| 3 | 1980-10-17 22:00:00 | ADMISSION | null |\n",
"| 3 | 1980-10-17 22:00:00 | LAB//HR | 120 |\n",
"| 3 | 1980-10-18 01:00:00 | LAB//temp | 37 |\n",
"| 3 | 1980-10-18 09:00:00 | DISCHARGE | |\n",
"| 3 | 1982-02-02 02:00:00 | ADMISSION | |\n",
"| 3 | 1982-02-02 04:00:00 | DEATH | |\n",
"| 3 | 1980-10-18 09:00:00 | DISCHARGE | null |\n",
"| 3 | 1982-02-02 02:00:00 | ADMISSION | null |\n",
"| 3 | 1982-02-02 04:00:00 | DEATH | null |\n",
"\n",
"The `code` column contains a string of an event that occurred at the given `timestamp` for a given `subject_id`. You may then create a series of predicate columns depending on what suits your needs. For instance, here are some plausible predicate columns that could be created:\n",
"\n",
Expand Down Expand Up @@ -95,7 +95,7 @@
"\n",
"ACES is able to automatically compute the predicates dataframe from your dataset and the fields defined in your task configuration if you are using the MEDS or ESGPT data standard. Should you choose to not transform your dataset into one of these two currently supported standards, you may also navigate the transformation yourself by creating your own predicates dataframe.\n",
"\n",
"Again, it is acceptable if your own predicates dataframe only contains `plain` predicate columns, as ACES can automatically create `derived` predicate columns from boolean logic in the task configuration file. However, for complex predicates that would be difficult to express using the simple boolean formulas in the configuration file, we recommend also creating them manually prior to using ACES.\n",
"Again, it is acceptable if your own predicates dataframe only contains `plain` predicate columns, as ACES can automatically create `derived` predicate columns from boolean logic in the task configuration file. However, for complex predicates that would be impossible to express (outside of `and/or`) in the configuration file, we direct you to create them manually prior to using ACES. Support for additional complex predicates is planned for the future, including the ability to use SQL or other expressions (see [#47](https://github.com/justin13601/ACES/issues/47)).\n",
"\n",
"**Note**: when creating `plain` predicate columns directly, you must still define them in the configuration file (they could be with an arbitrary value in the `code` field) - ACES will verify their existence after data loading (ie., by validating that a column exists with the predicate name in your dataframe). You will also need them for referencing in your windows."
]
Expand All @@ -108,11 +108,10 @@
"\n",
"```yaml\n",
"predicates:\n",
" ...\n",
" death:\n",
" code: foo\n",
" code: defined in data\n",
" discharge:\n",
" code: bar\n",
" code: defined in data\n",
" discharge_or_death:\n",
" expr: or(discharge, death)\n",
" ...\n",
Expand Down
4 changes: 2 additions & 2 deletions docs/source/notebooks/tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Code Example with ESGPT Synthetic Data"
"# Code Example with Synthetic Data"
]
},
{
Expand Down Expand Up @@ -137,7 +137,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## ESGPT Data"
"## Data"
]
},
{
Expand Down
13 changes: 12 additions & 1 deletion docs/source/profiling.md
Original file line number Diff line number Diff line change
@@ -1 +1,12 @@
# TODO - include the table from supplementary
# Computational Profile

| Task | # Patients | # Samples | Total Time (secs) | Max Mem (MB) |
| ------------------------------------- | ---------- | --------- | ----------------- | ------------ |
| First 24h in-hospital mortality | - | - | - | - |
| First 48h in-hospital mortality | - | - | - | - |
| First 24h in-ICU mortality | - | - | - | - |
| First 48h in-ICU mortality | - | - | - | - |
| 30d post-hospital-discharge mortality | - | - | - | - |
| 30d re-admission | - | - | - | - |
| Hospital length-of-stay | - | - | - | - |
| ICU length-of-stay | - | - | - | - |
5 changes: 5 additions & 0 deletions docs/source/technical.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
```{include} configuration.md
```

```{include} terminology.md
```
4 changes: 1 addition & 3 deletions docs/source/terminology.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
# Algorithm & Design

## Introduction
## Algorithm & Design

We will assume that we are given a dataframe `df` which details events that have happened to subjects. Each
row in the dataframe will have a `subject_id` column which identifies the subject, and a `timestamp` column
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address grammatical issue in the introductory paragraph.

- which details events that have happened to subjects.
+ detailing events that have occurred to subjects.
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
We will assume that we are given a dataframe `df` which details events that have happened to subjects. Each
We will assume that we are given a dataframe `df` detailing events that have occurred to subjects. Each
Tools
LanguageTool

[grammar] ~3-~3: The verb after “to” should be in the base form as part of the to-infinitive. A verb can take many forms, but the base form is always used in the to-infinitive. (TO_NON_BASE)
Context: ...ch details events that have happened to subjects. Each row in the dataframe will have a ...

Expand Down
Loading