diff --git a/docs/source/notebooks/examples.ipynb b/docs/source/notebooks/examples.ipynb index 31d3286..1ccbdd1 100644 --- a/docs/source/notebooks/examples.ipynb +++ b/docs/source/notebooks/examples.ipynb @@ -138,11 +138,11 @@ "\n", "The windows section contains the remaining three windows we defined previously - `input`, `gap`, and `target`.\n", "\n", - "`input` begins at the start of a patient's record (ie., `NULL`), and ends 24 hours past `trigger` (ie., `admission`). As we'd like to include the events specified at both the start and end of `input`, if present, we can set both `start_inclusive` and `end_inclusive` as `True`. Our constraint on the number of records is specified in `has` using the `_ANY_EVENT` predicate, with its value set to be greater or equal to 5 (ie., unbounded parameter on the right as seen in `(5, None)`). **Note**: since we'd like to make a prediction at the end of `input`, we can set `index_timestamp` to be `end`, which corresponds to the timestamp of `trigger + 24h`.\n", + "`input` begins at the start of a patient's record (ie., `NULL`), and ends 24 hours past `trigger` (ie., `admission`). As we'd like to include the events specified at both the start and end of `input`, if present, we can set both `start_inclusive` and `end_inclusive` as `True`. Our constraint on the number of records is specified in `has` using the `_ANY_EVENT` predicate, with its value set to be greater or equal to 5 (ie., unbounded parameter on the right as seen in `(5, None)`). **Note**: Since we'd like to make a prediction at the end of `input`, we can set `index_timestamp` to be `end`, which corresponds to the timestamp of `trigger + 24h`.\n", "\n", "`gap` also begins at `trigger`, and ends 48 hours after. As we have included included the left boundary event in `trigger` (ie., `admission`), it would be reasonable to not include it again as it should not play a role in `gap`. As such, we set `start_inclusive` to `False`. As we'd like our admission to be at least 48 hours long, we can place constraints specifying that there cannot be any `admission`, `discharge`, or `death` in `gap` (ie., right-bounded parameter at `0` as seen in `(None, 0)`).\n", "\n", - "`target` beings at the end of `gap`, and ends at the next discharge or death event (ie., `discharge_or_death` predicate). We can use this arrow notation which ACES recognizes as event references (ie., `->` and `<-`; see [Time Range Fields](https://eventstreamaces--39.org.readthedocs.build/en/39/configuration.html#time-range-fields)). In our case, we end `target` at the next `discharge_or_death`. Similarly, as we included the event at the end of `gap`, if any, already in `gap`, we can set `start_inclusive` to `False`. **Note**: since we'd like to make a binary mortality prediction, we can extract the `death` predicate as a label from `target`, by specifying the `label` field to be `death`." + "`target` beings at the end of `gap`, and ends at the next discharge or death event (ie., `discharge_or_death` predicate). We can use this arrow notation which ACES recognizes as event references (ie., `->` and `<-`; see [Time Range Fields](https://eventstreamaces--39.org.readthedocs.build/en/39/configuration.html#time-range-fields)). In our case, we end `target` at the next `discharge_or_death`. Similarly, as we included the event at the end of `gap`, if any, already in `gap`, we can set `start_inclusive` to `False`. **Note**: Since we'd like to make a binary mortality prediction, we can extract the `death` predicate as a label from `target`, by specifying the `label` field to be `death`." ] }, { diff --git a/docs/source/notebooks/predicates.ipynb b/docs/source/notebooks/predicates.ipynb index f310637..08f062c 100644 --- a/docs/source/notebooks/predicates.ipynb +++ b/docs/source/notebooks/predicates.ipynb @@ -55,7 +55,7 @@ "| 3 | 1982-02-02 02:00:00 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| 3 | 1982-02-02 04:00:00 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |\n", "\n", - "Note that this set of predicates are all `plain` predicates (ie., explicitly expressed as a value in the dataset), with the exception of the `derived` predicate `discharge_or_death`, which can be expressed by applying boolean logic on the `discharge` and `death` predicates (ie., `or(discharge, death)`). You may choose to create these columns for `derived` predicates explicitly (as you would `plain` predicates). Or, ACES can automatically create them from `plain` predicates if the boolean logic is provided in the task configuration file. Please see [Predicates](https://eventstreamaces.readthedocs.io/en/latest/configuration.html#predicates-plainpredicateconfig-and-derivedpredicateconfig) for more information.\n", + "**Note**: This set of predicates are all `plain` predicates (ie., explicitly expressed as a value in the dataset), with the exception of the `derived` predicate `discharge_or_death`, which can be expressed by applying boolean logic on the `discharge` and `death` predicates (ie., `or(discharge, death)`). You may choose to create these columns for `derived` predicates explicitly (as you would `plain` predicates). Or, ACES can automatically create them from `plain` predicates if the boolean logic is provided in the task configuration file. Please see [Predicates](https://eventstreamaces.readthedocs.io/en/latest/configuration.html#predicates-plainpredicateconfig-and-derivedpredicateconfig) for more information.\n", "\n", "Additionally, you may notice that the tables differ in shape. In the original raw data, (`subject_id`, `timestamp`) is not unique. However, a final predicates dataframe must have unique (`subject_id`, `timestamp`) pairs. If the MEDS or ESGPT standard is used, ACES will automatically collapse rows down into unique per-patient per-timestamp levels (ie., grouping by these two columns and aggregating by summing predicate counts). However, if creating predicate columns directly, please ensure your dataframe is unique over (`subject_id`, `timestamp`)." ] diff --git a/docs/source/notebooks/tutorial.ipynb b/docs/source/notebooks/tutorial.ipynb index d2335bb..381a451 100644 --- a/docs/source/notebooks/tutorial.ipynb +++ b/docs/source/notebooks/tutorial.ipynb @@ -209,7 +209,7 @@ "\n", "Each row of the resulting dataframe is a valid realization of our task tree. Hence, each instance can be included in our cohort used for the prediction of in-hospital mortality as defined in our task configuration file. The output contains:\n", "\n", - "- `subject_id`: subject IDs of our cohort (note: since we'd like to treat individual admissions as separate samples, there will be duplicate subject IDs)\n", + "- `subject_id`: subject IDs of our cohort (since we'd like to treat individual admissions as separate samples, there will be duplicate subject IDs)\n", "- `index_timestamp`: timestamp of when a prediction is made, which coincides with the `end` timestamp of the `input` window (as specified in our task configuration)\n", "- `label`: binary label of mortality, which is derived from the `death` predicate of the `target` window (as specified in our task configuration)\n", "- `trigger`: timestamp of the `trigger` event, which is the `admission` predicate (as specified in our task configuration)\n", diff --git a/docs/source/profiling.md b/docs/source/profiling.md index aed3fec..77ec4ed 100644 --- a/docs/source/profiling.md +++ b/docs/source/profiling.md @@ -6,11 +6,11 @@ The MIMIC-IV MEDS schema has approximately 50,000 patients per shard with an ave All tests were executed on a Linux server with 36 cores and 340 GBs of RAM available. A single MEDS shard was used, which provides a bounded computational overview of ACES. For instance, if one shard costs $M$ memory and $T$ time, then $N$ shards may be executed in parallel with $N*M$ memory and $T$ time, or in series with $M$ memory and $T*N$ time. -| Task | # Patients | # Samples | Total Time (secs) | Max Mem (MiBs) | -| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | --------- | ----------------- | -------------- | -| [First 24h in-hospital mortality](https://github.com/mmcdermott/PIE_MD/blob/e94189864080f957fcf2b7416c1dde401dfe4c15/tasks/MIMIC-IV/mortality/in_hospital/first_24h.yaml) | 20,971 | 58,823 | 363.09 | 106,367.14 | -| [First 48h in-hospital mortality](https://github.com/mmcdermott/PIE_MD/blob/e94189864080f957fcf2b7416c1dde401dfe4c15/tasks/MIMIC-IV/mortality/in_hospital/first_48h.yaml) | 18,847 | 60,471 | 364.62 | 108,913.95 | -| [First 24h in-ICU mortality](https://github.com/mmcdermott/PIE_MD/blob/e94189864080f957fcf2b7416c1dde401dfe4c15/tasks/MIMIC-IV/mortality/in_icu/first_24h.yaml) | 4,768 | 7,156 | 216.81 | 39,594.37 | -| [First 48h in-ICU mortality](https://github.com/mmcdermott/PIE_MD/blob/e94189864080f957fcf2b7416c1dde401dfe4c15/tasks/MIMIC-IV/mortality/in_icu/first_48h.yaml) | 4,093 | 7,112 | 217.98 | 39,451.86 | -| [30d post-hospital-discharge mortality](https://github.com/mmcdermott/PIE_MD/blob/e94189864080f957fcf2b7416c1dde401dfe4c15/tasks/MIMIC-IV/mortality/post_hospital_discharge/30d.yaml) | 28,416 | 68,547 | 182.91 | 30,434.86 | -| [30d re-admission](https://github.com/mmcdermott/PIE_MD/blob/e94189864080f957fcf2b7416c1dde401dfe4c15/tasks/MIMIC-IV/readmission/30d.yaml) | 18,908 | 464,821 | 367.41 | 106,064.04 | +| Task | # Patients | # Samples | Total Time (secs) | Max Memory (MiBs) | +| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | --------- | ----------------- | ----------------- | +| [First 24h in-hospital mortality](https://github.com/mmcdermott/PIE_MD/blob/e94189864080f957fcf2b7416c1dde401dfe4c15/tasks/MIMIC-IV/mortality/in_hospital/first_24h.yaml) | 20,971 | 58,823 | 363.09 | 106,367.14 | +| [First 48h in-hospital mortality](https://github.com/mmcdermott/PIE_MD/blob/e94189864080f957fcf2b7416c1dde401dfe4c15/tasks/MIMIC-IV/mortality/in_hospital/first_48h.yaml) | 18,847 | 60,471 | 364.62 | 108,913.95 | +| [First 24h in-ICU mortality](https://github.com/mmcdermott/PIE_MD/blob/e94189864080f957fcf2b7416c1dde401dfe4c15/tasks/MIMIC-IV/mortality/in_icu/first_24h.yaml) | 4,768 | 7,156 | 216.81 | 39,594.37 | +| [First 48h in-ICU mortality](https://github.com/mmcdermott/PIE_MD/blob/e94189864080f957fcf2b7416c1dde401dfe4c15/tasks/MIMIC-IV/mortality/in_icu/first_48h.yaml) | 4,093 | 7,112 | 217.98 | 39,451.86 | +| [30d post-hospital-discharge mortality](https://github.com/mmcdermott/PIE_MD/blob/e94189864080f957fcf2b7416c1dde401dfe4c15/tasks/MIMIC-IV/mortality/post_hospital_discharge/30d.yaml) | 28,416 | 68,547 | 182.91 | 30,434.86 | +| [30d re-admission](https://github.com/mmcdermott/PIE_MD/blob/e94189864080f957fcf2b7416c1dde401dfe4c15/tasks/MIMIC-IV/readmission/30d.yaml) | 18,908 | 464,821 | 367.41 | 106,064.04 |