justin13601 · justin13601 · Jun 13, 2024 · Jun 13, 2024 · Jun 13, 2024 · Jun 13, 2024
diff --git a/docs/source/terminology.md → docs/source/algorithm.md b/docs/source/terminology.md → docs/source/algorithm.md
@@ -114,8 +114,8 @@ In the rest of this document, we will detail how our algorithm automatically ext
 these criteria and the terminology we use to describe our algorithm (both here and in the raw source code and
 code comments). There are certain limitations of this algorithm where some kinds of tasks cannot yet be
 expressed directly (more information available in the
-[FAQs](https://eventstreamaces.readthedocs.io/en/latest/overview.html#faqs) and the
-[Future Roadmap](https://eventstreamaces.readthedocs.io/en/latest/overview.html#future-roadmap)). Details
+[FAQs](https://eventstreamaces.readthedocs.io/en/latest/readme.html#faqs) and the
+[Future Roadmap](https://eventstreamaces.readthedocs.io/en/latest/readme.html#future-roadmap)). Details
 about the true configuration language that is used in practice to specify "windows" can be found in
 {doc}`/configuration`. Some task examples are available in {doc}`/notebooks/examples`.
 
@@ -188,7 +188,7 @@ During initialization, we will be given the following inputs:
 
 ##### `cfg`
 
-`cfg` is a `TaskExtractorConfig` object containing our task definition, include all information about
+`cfg` is a {py:class}`aces.config.TaskExtractorConfig` object containing our task definition, include all information about
-`cfg` is a {py:class}`aces.config.TaskExtractorConfig` object containing our task definition, include all information about
+`cfg` is a :py:class:`aces.config.TaskExtractorConfig` object containing our task definition, include all information about
-`cfg` is a {py:class}`aces.config.TaskExtractorConfig` object containing our task definition, include all information about
+`cfg` is a :py:class:`aces.config.TaskExtractorConfig` object containing our task definition, include all information about
 predicates, the trigger event, and windows.
 
 ##### `predicates_df`

diff --git a/docs/source/configuration.md b/docs/source/configuration.md
@@ -6,7 +6,7 @@ format (recommended) or the [ESGPT](https://eventstreamml.readthedocs.io/en/late
 system works by defining a configuration object that details the underlying concepts, inclusion/exclusion, and
 labeling criteria for the cohort/task to be extracted, then using a recursive algorithm to identify all
 realizations of valid patient time-ranges of data that satisfy those constraints from the raw data. For more
-details on the recursive algorithm, see the `terminology.md` file.
+details on the recursive algorithm, see [Algorithm Design](https://eventstreamaces.readthedocs.io/en/latest/technical.html#algorithm-design).
 
 As indicated above, these cohorts are specified through a combination of concepts (realized as event
 _predicate_ functions, _aka_ "predicates") which are _dataset specific_ and inclusion/exclusion/labeling
@@ -28,10 +28,10 @@ ______________________________________________________________________
 In the machine form used by ACES, the configuration file consists of three parts:
 
 - `predicates`, stored as a dictionary from string predicate names (which must be unique) to either
-  `PlainPredicateConfig` objects, which store raw predicates with no dependencies on other predicates, or
-  `DerivedPredicateConfig` objects, which store predicates that build on other predicates.
+  {py:class}`aces.config.PlainPredicateConfig` objects, which store raw predicates with no dependencies on other predicates, or
+  {py:class}`aces.config.DerivedPredicateConfig` objects, which store predicates that build on other predicates.
 - `trigger`, stored as a string to `EventConfig`
-- `windows`, stored as a dictionary from string window names (which must be unique) to `WindowConfig`
+- `windows`, stored as a dictionary from string window names (which must be unique) to {py:class}`aces.config.WindowConfig`
-  {py:class}`aces.config.PlainPredicateConfig` objects, which store raw predicates with no dependencies on other predicates, or
-  {py:class}`aces.config.DerivedPredicateConfig` objects, which store predicates that build on other predicates.
- `trigger`, stored as a string to `EventConfig`
- `windows`, stored as a dictionary from string window names (which must be unique) to `WindowConfig`
- `windows`, stored as a dictionary from string window names (which must be unique) to {py:class}`aces.config.WindowConfig`
+  :py:class:`aces.config.PlainPredicateConfig` objects, which store raw predicates with no dependencies on other predicates, or
+  :py:class:`aces.config.DerivedPredicateConfig` objects, which store predicates that build on other predicates.
+- `trigger`, stored as a string to `EventConfig`
+- `windows`, stored as a dictionary from string window names (which must be unique) to :py:class:`aces.config.WindowConfig`
-  {py:class}`aces.config.PlainPredicateConfig` objects, which store raw predicates with no dependencies on other predicates, or
-  {py:class}`aces.config.DerivedPredicateConfig` objects, which store predicates that build on other predicates.
- `trigger`, stored as a string to `EventConfig`
- `windows`, stored as a dictionary from string window names (which must be unique) to `WindowConfig`
- `windows`, stored as a dictionary from string window names (which must be unique) to {py:class}`aces.config.WindowConfig`
+  :py:class:`aces.config.PlainPredicateConfig` objects, which store raw predicates with no dependencies on other predicates, or
+  :py:class:`aces.config.DerivedPredicateConfig` objects, which store predicates that build on other predicates.
+- `trigger`, stored as a string to `EventConfig`
+- `windows`, stored as a dictionary from string window names (which must be unique) to :py:class:`aces.config.WindowConfig`
   objects.
 
 Below, we will detail each of these configuration objects.
@@ -40,7 +40,7 @@ ______________________________________________________________________
 
 ### Predicates: `PlainPredicateConfig` and `DerivedPredicateConfig`
 
-#### `PlainPredicateConfig`: Configuration of Predicates that can be Computed Directly from Raw Data
+#### {py:class}`aces.config.PlainPredicateConfig`: Configuration of Predicates that can be Computed Directly from Raw Data
 
 These configs consist of the following four fields:
 
@@ -87,7 +87,7 @@ on its source format.
    be of the univariate regression type and its value, if needed, will be pulled from the corresponding
    column.
 
-#### `DerivedPredicateConfig`: Configuration of Predicates that Depend on Other Predicates
+#### {py:class}`aces.config.DerivedPredicateConfig`: Configuration of Predicates that Depend on Other Predicates
 
 These configuration objects consist of only a single string field--`expr`--which contains a limited grammar of
 accepted operations that can be applied to other predicates, containing precisely the following:
@@ -100,7 +100,7 @@ analytic operations over predicates.
 
 ______________________________________________________________________
 
-### Events: `EventConfig`
+### Events: {py:class}`aces.config.EventConfig`
 
 The event config consists of only a single field, `predicate`, which specifies the predicate that must be
 observed with value greater than one to satisfy the event. There can only be one defined "event" with an
@@ -110,7 +110,7 @@ The value of its field can be any defined predicate.
 
 ______________________________________________________________________
 
-### Windows: `WindowConfig`
+### Windows: {py:class}`aces.config.WindowConfig`
 
 Windows contain a tracking `name` field, and otherwise are specified with two parts: (1) A set of four
 parameters (`start`, `end`, `start_inclusive`, and `end_inclusive`) that specify the time range of the window,

diff --git a/docs/source/notebooks/examples.ipynb b/docs/source/notebooks/examples.ipynb
@@ -6,10 +6,10 @@
    "source": [
     "# Task Examples\n",
     "\n",
-    "Provided below are two examples of mortality prediction tasks that ACES could easily extract subject cohorts for. The configurations have been tested all the provided synthetic data in the repository (`../../../sample_data/`), as well as the MIMIC-IV dataset loaded using MEDS & ESGPT (with very minor changes to the below predicate definition). The configuration files for both of these tasks are provided in the repository (`../../../sample_configs`), and cohorts can be extracted using the `aces-cli` tool:\n",
+    "Provided below are two examples of mortality prediction tasks that ACES could easily extract subject cohorts for. The configurations have been tested all the provided synthetic data in the repository ([`sample_data/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_data)), as well as the MIMIC-IV dataset loaded using MEDS & ESGPT (with very minor changes to the below predicate definition). The configuration files for both of these tasks are provided in the repository ([`sample_configs/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_configs)), and cohorts can be extracted using the `aces-cli` tool:\n",
     "\n",
     "```bash\n",
-    "aces-cli data.path='/path/to/MIMIC/ESGPT/schema/' data.standard='esgpt' cohort_dir='../../../sample_configs' cohort_name='...'\n",
+    "aces-cli data.path='/path/to/MIMIC/ESGPT/schema/' data.standard='esgpt' cohort_dir='sample_configs/' cohort_name='...'\n",
     "```"
    ]
   },
@@ -138,11 +138,15 @@
     "\n",
     "The windows section contains the remaining three windows we defined previously - `input`, `gap`, and `target`.\n",
     "\n",
-    "`input` begins at the start of a patient's record (ie., `NULL`), and ends 24 hours past `trigger` (ie., `admission`). As we'd like to include the events specified at both the start and end of `input`, if present, we can set both `start_inclusive` and `end_inclusive` as `True`. Our constraint on the number of records is specified in `has` using the `_ANY_EVENT` predicate, with its value set to be greater or equal to 5 (ie., unbounded parameter on the right as seen in `(5, None)`). **Note**: Since we'd like to make a prediction at the end of `input`, we can set `index_timestamp` to be `end`, which corresponds to the timestamp of `trigger + 24h`.\n",
+    "`input` begins at the start of a patient's record (ie., `NULL`), and ends 24 hours past `trigger` (ie., `admission`). As we'd like to include the events specified at both the start and end of `input`, if present, we can set both `start_inclusive` and `end_inclusive` as `True`. Our constraint on the number of records is specified in `has` using the `_ANY_EVENT` predicate, with its value set to be greater or equal to 5 (ie., unbounded parameter on the right as seen in `(5, None)`). \n",
+    "\n",
+    "**Note**: Since we'd like to make a prediction at the end of `input`, we can set `index_timestamp` to be `end`, which corresponds to the timestamp of `trigger + 24h`.\n",
     "\n",
     "`gap` also begins at `trigger`, and ends 48 hours after. As we have included included the left boundary event in `trigger` (ie., `admission`), it would be reasonable to not include it again as it should not play a role in `gap`. As such, we set `start_inclusive` to `False`. As we'd like our admission to be at least 48 hours long, we can place constraints specifying that there cannot be any `admission`, `discharge`, or `death` in `gap` (ie., right-bounded parameter at `0` as seen in `(None, 0)`).\n",
     "\n",
-    "`target` beings at the end of `gap`, and ends at the next discharge or death event (ie., `discharge_or_death` predicate). We can use this arrow notation which ACES recognizes as event references (ie., `->` and `<-`; see [Time Range Fields](https://eventstreamaces--39.org.readthedocs.build/en/39/configuration.html#time-range-fields)). In our case, we end `target` at the next `discharge_or_death`. Similarly, as we included the event at the end of `gap`, if any, already in `gap`, we can set `start_inclusive` to `False`. **Note**: Since we'd like to make a binary mortality prediction, we can extract the `death` predicate as a label from `target`, by specifying the `label` field to be `death`."
+    "`target` beings at the end of `gap`, and ends at the next discharge or death event (ie., `discharge_or_death` predicate). We can use this arrow notation which ACES recognizes as event references (ie., `->` and `<-`; see [Time Range Fields](https://eventstreamaces.readthedocs.io/en/latest/technical.html#time-range-fields)). In our case, we end `target` at the next `discharge_or_death`. Similarly, as we included the event at the end of `gap`, if any, already in `gap`, we can set `start_inclusive` to `False`. \n",
+    "\n",
+    "**Note**: Since we'd like to make a binary mortality prediction, we can extract the `death` predicate as a label from `target`, by specifying the `label` field to be `death`."
    ]
   },
   {
@@ -269,7 +273,7 @@
    "source": [
     "## Other Examples\n",
     "\n",
-    "A few other examples are provided in `../../../sample_configs/` of the repository. We will continue to add task configurations to this folder or to a benchmarking effort for EHR representation learning. More information can be found [here](https://github.com/mmcdermott/PIE_MD/tree/main) - stay tuned!"
+    "A few other examples are provided in [`sample_configs/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_configs) of the repository. We will continue to add task configurations to this folder or to a benchmarking effort for EHR representation learning. More information can be found [here](https://github.com/mmcdermott/PIE_MD/tree/main) - stay tuned!"
    ]
   }
  ],

diff --git a/docs/source/notebooks/predicates.ipynb b/docs/source/notebooks/predicates.ipynb
@@ -66,7 +66,7 @@
    "source": [
     "## Sample Predicates DataFrame\n",
     "\n",
-    "A sample predicates dataframe is provided in the repository (`../../../sample_data/sample_data.csv`). This dataframe holds completely synthetic data and was designed such that the accompanying sample configuration files in the repository (`../../../sample_configs`) could be directly extracted."
+    "A sample predicates dataframe is provided in the repository ([`sample_data/sample_data.csv`](https://github.com/justin13601/ACES/blob/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_data/sample_data.csv)). This dataframe holds completely synthetic data and was designed such that the accompanying sample configuration files in the repository ([`sample_configs/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_configs)) could be directly extracted."
    ]
   },
   {

diff --git a/docs/source/notebooks/tutorial.ipynb b/docs/source/notebooks/tutorial.ipynb
@@ -47,7 +47,7 @@
    "source": [
     "### Directories\n",
     "\n",
-    "Next, let's specify our paths and directories. In this tutorial, we will extract a cohort for a typical in-hospital mortality prediction task from the ESGPT synthetic sample dataset. The task configuration file and sample data are both shipped with the repository in `sample_configs` and `sample_data` folders in the project root, respectively."
+    "Next, let's specify our paths and directories. In this tutorial, we will extract a cohort for a typical in-hospital mortality prediction task from the ESGPT synthetic sample dataset. The task configuration file and sample data are both shipped with the repository in [`sample_configs/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_configs) and [`sample_data/`](https://github.com/justin13601/ACES/tree/5cf0261ad22c22972b0bd553ab5bb826cb9e637d/sample_data) folders in the project root, respectively."
    ]
   },
   {

diff --git a/docs/source/technical.md b/docs/source/technical.md
@@ -3,5 +3,5 @@
 ```{include} configuration.md
 ```
 
-```{include} terminology.md
+```{include} algorithm.md
 ```
diff --git a/docs/source/usage.md b/docs/source/usage.md
@@ -175,15 +175,17 @@ To query from a direct predicates dataframe:
 
 #### Task Configuration
 
-`cohort_dir`: Directory the your task configuration file
+`cohort_dir`: Directory of your task configuration file
 
 `cohort_name`: Name of the task configuration file
 
 The above two fields are used for automatically loading task configurations, saving results, and logging:
 
 `config_path`: Path to the task configuration file. Defaults to `${cohort_dir}/${cohort_name}.yaml`
 
-`output_filepath`: Path to store the outputs. Defaults to `${cohort_dir}/${cohort_name}/${data.shard}.parquet` for MEDS with multiple shards, and `${cohort_dir}/${cohort_name}.parquet` otherwise.
+`output_filepath`: Path to store the outputs. Defaults to `${cohort_dir}/${cohort_name}/${data.shard}.parquet` for MEDS with multiple shards, and `${cohort_dir}/${cohort_name}.parquet` otherwise
+
+`log_dir`: Path to store logs. Defaults to `${cohort_dir}/${cohort_name}/.logs`
 
 #### Tab Completion
 
@@ -237,7 +239,7 @@ You can also use the `aces.query.query()` function to extract a cohort in Python
 .. autofunction:: aces.query.query
 ```
 
-The `cfg` parameter must be of type `config.TaskExtractorConfig`, and the `predicates_df` parameter must be of type `polars.DataFrame`.
+The `cfg` parameter must be of type {py:class}`aces.config.TaskExtractorConfig`, and the `predicates_df` parameter must be of type `polars.DataFrame`.
 
 Details about the configuration language used to define the `cfg` parameter can be found in {doc}`/configuration`.