Final fixes #30

justin13601 · 2024-05-22T19:03:12Z

Summary by CodeRabbit

New Features
- Added new condition has: _ANY_EVENT: (5, None) in inhospital-mortality.yaml.
- Introduced abnormally_high_spo2 and redefined abnormal_spo2 as a combination of low and high SpO2 conditions in abnormal-lab.yaml.
Updates
- Renamed predicates in long_term-incidence.yaml to specific ICD9CM and ICD10CM codes.
- Updated configuration documentation with detailed explanations and specifications.
Enhancements
- Changed logging levels from debug to info for better visibility of actions and processes across multiple modules and functions.
- Improved logging for generating predicate columns and querying processes.
Documentation
- Comprehensive update to configuration language specification in configuration.md.
- Renamed sections in index.md and terminology.md for clarity.

coderabbitai · 2024-05-22T19:03:18Z

Walkthrough

The recent updates involve transitioning logging levels from debug to info across various modules, enhancing visibility of key operations. Function signatures and configuration files have been refined for improved type safety and clarity. Documentation has significantly expanded to provide detailed explanations of the configuration language and project terms.

Changes

File/Path	Change Summary
`run.py`, `src/esgpt_task_querying/config.py`, `src/esgpt_task_querying/constraints.py`, `src/esgpt_task_querying/extract_subtree.py`, `src/esgpt_task_querying/predicates.py`, `src/esgpt_task_querying/query.py`, `src/esgpt_task_querying/utils.py`	Updated log levels from `debug` to `info` for various functions and actions. Improved clarity and visibility in logging.
`sample_configs/01-inhospital-mortality.yaml`, `sample_configs/02-abnormal-lab.yaml`, `sample_configs/06-long_term-incidence.yaml`	Refactored and added conditions in configuration files for better specificity and representation.
`docs/source/configuration.md`, `docs/source/index.md`, `docs/source/terminology.md`	Extensive updates in documentation to provide detailed explanations of configuration language and terms used in the project.

🐇
In logs we trust, from debug to info,
Clarity shines, our code's new credo.
Predicates refined, configurations clear,
Documentation expanded, no need to fear.
Through code and config, we hop with glee,
A better system, for all to see.
🐇

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between c9c081e and 613db2d.

Files ignored due to path filters (1)

result.csv is excluded by !**/*.csv

Files selected for processing (6)

run.py (1 hunks)
src/esgpt_task_querying/config.py (1 hunks)
src/esgpt_task_querying/constraints.py (1 hunks)
src/esgpt_task_querying/extract_subtree.py (1 hunks)
src/esgpt_task_querying/predicates.py (4 hunks)
src/esgpt_task_querying/query.py (1 hunks)

Files skipped from review due to trivial changes (6)

run.py
src/esgpt_task_querying/config.py
src/esgpt_task_querying/constraints.py
src/esgpt_task_querying/extract_subtree.py
src/esgpt_task_querying/predicates.py
src/esgpt_task_querying/query.py

coderabbitai

Actionable comments posted: 0

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 613db2d and 8883af4.

Files selected for processing (3)

sample_configs/02-abnormal-lab.yaml (1 hunks)
src/esgpt_task_querying/config.py (13 hunks)
src/esgpt_task_querying/predicates.py (2 hunks)

Files skipped from review as they are similar to previous changes (2)

src/esgpt_task_querying/config.py
src/esgpt_task_querying/predicates.py

Additional comments not posted (5)

sample_configs/02-abnormal-lab.yaml (5)

10-10: The definition of abnormally_low_spo2 is clear and logically consistent.

14-17: The definition of abnormally_high_spo2 is clear and logically consistent.

18-19: The definition of abnormal_spo2 using an OR expression to combine low and high conditions is appropriate and clear.

19-19: The use of normal_spo2 as the trigger event is appropriate and aligns with the defined normal range.

Line range hint 21-37: The configuration of time windows for SpO2 monitoring is well-structured and clear.

codecov · 2024-05-23T06:09:10Z

Codecov Report

Attention: Patch coverage is 19.14894% with 76 lines in your changes are missing coverage. Please review.

Files	Coverage Δ
src/esgpt_task_querying/constraints.py	`100.00% <100.00%> (ø)`
src/esgpt_task_querying/extract_subtree.py	`91.89% <75.00%> (ø)`
src/esgpt_task_querying/config.py	`82.17% <0.00%> (+9.04%)`	⬆️
src/esgpt_task_querying/query.py	`30.43% <25.00%> (+17.47%)`	⬆️
src/esgpt_task_querying/utils.py	`56.52% <47.05%> (ø)`
src/esgpt_task_querying/predicates.py	`12.16% <7.69%> (-71.18%)`	⬇️

coderabbitai

Actionable comments posted: 2

Outside diff range and nitpick comments (2)

docs/source/terminology.md (2)
Line range hint 124-124: Correct the heading level to maintain a logical structure.
- #### Event
+ ### Event
Line range hint 101-101: Specify the language for the fenced code block to enable syntax highlighting.
- ```yaml
+ ```yaml

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 8883af4 and 823701d.

Files selected for processing (4)

docs/source/configuration.md (1 hunks)
docs/source/index.md (1 hunks)
docs/source/terminology.md (1 hunks)
src/esgpt_task_querying/extract_subtree.py (1 hunks)

Files skipped from review due to trivial changes (1)

docs/source/index.md

Additional Context Used

Markdownlint (2)

docs/source/terminology.md (2)

124: Expected: h3; Actual: h4
Heading levels should only increment by one level at a time

101: null
Fenced code blocks should have a language specified

GitHub Check Runs (1)

codecov/patch failure (13)

src/esgpt_task_querying/extract_subtree.py: [warning] 257-257: src/esgpt_task_querying/extract_subtree.py#L257
Added line #L257 was not covered by tests

Additional comments not posted (2)

docs/source/configuration.md (1)
33-37: Consider removing the extra spaces before the list items for consistency.
- - `predicates`, stored as a dictionary from string predicate names (which must be unique) to either
+ - `predicates`, stored as a dictionary from string predicate names (which must be unique) to either
Likely invalid or redundant comment.
src/esgpt_task_querying/extract_subtree.py (1)

257-257: Ensure this newly added line is covered by unit tests to maintain code quality and prevent regressions.

coderabbitai · 2024-05-23T06:23:59Z

docs/source/configuration.md

+system works by defining a configuration object that details the underlying concepts, inclusion/exclusion, and
+labeling criteria for the cohort/task to be extracted, then using a recursive algorithm to identify all
+realizations of valid patient time-ranges of data that satisfy those constraints from the raw data. For more
+details on the recursive algorithm, see the `terminology.md` file. **TODO** better integrate, name, and link


The TODO comment about integrating documentation files should be addressed or tracked in a project management tool.

Would you like me to open a GitHub issue to track this documentation integration task?

coderabbitai · 2024-05-23T06:23:59Z

docs/source/configuration.md

+  `value_min` (with these options being decided on the basis of `value_min_inclusive`, where
+  `value_min_incusive=True` indicating that an observation satisfies this predicate if its value is greater
+  than or equal to `value_min`, and `value_min_inclusive=False` indicating a greater than but not equal to
+  will be used.
+- `value_max`: If specified, an observation will only satisfy this predicate if the occurrence of the
+  underlying `code` with a reported numerical value that is either less than or less than or equal to
+  `value_max` (with these options being decided on the basis of `value_max_inclusive`, where


Consider simplifying the phrase "on the basis of" to "based on" for clarity and conciseness.

- (with these options being decided on the basis of `value_min_inclusive`, where + (with these options being based on `value_min_inclusive`, where

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

`value_min` (with these options being decided on the basis of `value_min_inclusive`, where

`value_min_incusive=True` indicating that an observation satisfies this predicate if its value is greater

than or equal to `value_min`, and `value_min_inclusive=False` indicating a greater than but not equal to

will be used.

- `value_max`: If specified, an observation will only satisfy this predicate if the occurrence of the

underlying `code` with a reported numerical value that is either less than or less than or equal to

`value_max` (with these options being decided on the basis of `value_max_inclusive`, where

`value_min` (with these options being based on `value_min_inclusive`, where

`value_min_incusive=True` indicating that an observation satisfies this predicate if its value is greater

than or equal to `value_min`, and `value_min_inclusive=False` indicating a greater than but not equal to

will be used.

- `value_max`: If specified, an observation will only satisfy this predicate if the occurrence of the

underlying `code` with a reported numerical value that is either less than or less than or equal to

`value_max` (with these options being decided on the basis of `value_max_inclusive`, where

coderabbitai

Actionable comments posted: 0

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 823701d and 8a7d931.

Files ignored due to path filters (1)

sample_data/sample.csv is excluded by !**/*.csv

Files selected for processing (8)

run.py (2 hunks)
sample_configs/01-inhospital-mortality.yaml (1 hunks)
sample_configs/02-abnormal-lab.yaml (1 hunks)
sample_configs/06-long_term-incidence.yaml (1 hunks)
src/esgpt_task_querying/constraints.py (1 hunks)
src/esgpt_task_querying/predicates.py (1 hunks)
src/esgpt_task_querying/query.py (2 hunks)
src/esgpt_task_querying/utils.py (2 hunks)

Files skipped from review as they are similar to previous changes (4)

run.py
sample_configs/02-abnormal-lab.yaml
src/esgpt_task_querying/constraints.py
src/esgpt_task_querying/query.py

Additional comments not posted (5)

sample_configs/01-inhospital-mortality.yaml (1)

19-20: The addition of the _ANY_EVENT condition in the windows configuration seems appropriate for inhospital mortality scenarios.

sample_configs/06-long_term-incidence.yaml (1)

6-27: Renaming predicates to include specific ICD codes and adding a combined predicate for myocardial infarction enhances clarity and specificity.

src/esgpt_task_querying/utils.py (2)

40-56: The implementation of capture_output and log_tree functions is well-done, providing a robust way to capture and log output.

1-7: The addition of new imports for output capturing and tree logging is appropriate for enhanced debugging and logging capabilities.

src/esgpt_task_querying/predicates.py (1)

13-90: Updating logging levels from debug to info and adding new functions for generating predicates from different data sources improve visibility and functionality.

coderabbitai

Actionable comments posted: 1

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 8a7d931 and ba45388.

Files selected for processing (1)

src/esgpt_task_querying/predicates.py (2 hunks)

Additional Context Used

GitHub Check Runs (1)

codecov/patch failure (30)

src/esgpt_task_querying/predicates.py: [warning] 13-14: src/esgpt_task_querying/predicates.py#L13-L14
Added lines #L13 - L14 were not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 19-24: src/esgpt_task_querying/predicates.py#L19-L24
Added lines #L19 - L24 were not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 27-28: src/esgpt_task_querying/predicates.py#L27-L28
Added lines #L27 - L28 were not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 32-33: src/esgpt_task_querying/predicates.py#L32-L33
Added lines #L32 - L33 were not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 38-41: src/esgpt_task_querying/predicates.py#L38-L41
Added lines #L38 - L41 were not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 44-46: src/esgpt_task_querying/predicates.py#L44-L46
Added lines #L44 - L46 were not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 50-53: src/esgpt_task_querying/predicates.py#L50-L53
Added lines #L50 - L53 were not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 60-63: src/esgpt_task_querying/predicates.py#L60-L63
Added lines #L60 - L63 were not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 69-71: src/esgpt_task_querying/predicates.py#L69-L71
Added lines #L69 - L71 were not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 73-76: src/esgpt_task_querying/predicates.py#L73-L76
Added lines #L73 - L76 were not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 78-79: src/esgpt_task_querying/predicates.py#L78-L79
Added lines #L78 - L79 were not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 82-82: src/esgpt_task_querying/predicates.py#L82
Added line #L82 was not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 84-84: src/esgpt_task_querying/predicates.py#L84
Added line #L84 was not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 87-87: src/esgpt_task_querying/predicates.py#L87
Added line #L87 was not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 96-96: src/esgpt_task_querying/predicates.py#L96
Added line #L96 was not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 99-100: src/esgpt_task_querying/predicates.py#L99-L100
Added lines #L99 - L100 were not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 118-119: src/esgpt_task_querying/predicates.py#L118-L119
Added lines #L118 - L119 were not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 122-132: src/esgpt_task_querying/predicates.py#L122-L132
Added lines #L122 - L132 were not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 135-135: src/esgpt_task_querying/predicates.py#L135
Added line #L135 was not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 138-138: src/esgpt_task_querying/predicates.py#L138
Added line #L138 was not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 142-142: src/esgpt_task_querying/predicates.py#L142
Added line #L142 was not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 144-144: src/esgpt_task_querying/predicates.py#L144
Added line #L144 was not covered by tests

src/esgpt_task_querying/predicates.py: [warning] 147-147: src/esgpt_task_querying/predicates.py#L147
Added line #L147 was not covered by tests

Additional comments not posted (3)

src/esgpt_task_querying/predicates.py (3)

49-100: Ensure test coverage for the function generate_plain_predicates_from_esgpt.

12-28: Ensure test coverage for the function verify_plain_predicates_from_csv.

31-46: Ensure test coverage for the function generate_plain_predicates_from_meds.

coderabbitai · 2024-05-23T20:11:14Z

src/esgpt_task_querying/predicates.py

    """Generate predicate columns based on the configuration.

    Args:
        cfg: The TaskExtractorConfig object containing the predicates information.
-        df_data: The Polars DataFrame containing the original external data.
+        data_path: Path to external data (file path to .csv or .parquet, or ESGPT directory) as
+            string or Path.
        standard: The data standard, either 'CSV, 'MEDS' or 'ESGPT'.

    Returns:
        predicates_df: The Polars DataFrame with the added predicate columns.

    Raises:
        ValueError: If an invalid predicate type is specified in the configuration.
-
-    Examples: TODO
    """
-    logger.debug("Generating predicate columns...")
-    predicate_cols = []
+    if isinstance(data_path, str):
+        data_path = Path(data_path)

    # plain predicates
-    match standard:
-        case "CSV":
-            for name, plain_predicate in cfg.plain_predicates.items():
-                data = data.with_columns(
-                    plain_predicate.ESGPT_eval_expr(plain_predicate.values_column).cast(pl.UInt16).alias(name)
-                )
-                logger.debug(f"Added predicate column '{name}'.")
-                predicate_cols.append(name)
-        case "MEDS":
-            for name, plain_predicate in cfg.plain_predicates.items():
-                data = data.with_columns(plain_predicate.MEDS_eval_expr().alias(name))
-                logger.debug(f"Added predicate column '{name}'.")
-                predicate_cols.append(name)
-        case "ESGPT":
-            for name, plain_predicate in cfg.plain_predicates.items():
-                if "event_type" in plain_predicate.code:
-                    data[0] = data[0].with_columns(
-                        plain_predicate.ESGPT_eval_expr(plain_predicate.values_column)
-                        .cast(pl.UInt16)
-                        .alias(name)
-                    )
-                else:
-                    data[1] = data[1].with_columns(
-                        plain_predicate.ESGPT_eval_expr(plain_predicate.values_column)
-                        .cast(pl.UInt16)
-                        .alias(name)
-                    )
-                logger.debug(f"Added predicate column '{name}'.")
-                predicate_cols.append(name)
-
-            # aggregate measurements (data[1]) by summing columns that are in count_cols, and taking the max
-            # for columns in boolean_cols -> new ver only maxing
-            data[1] = (
-                data[1]
-                .group_by(["event_id"])
-                .agg(
-                    *[pl.col(c).max().cast(pl.Int64) for c in data[1].columns if c in predicate_cols],
-                )
-            )
-
-            data = data[0].join(data[1], on="event_id", how="left")
-            data = data.select(
-                "subject_id",
-                "timestamp",
-                *[pl.col(c) for c in data.columns if c in predicate_cols],
-            )
+    plain_predicates = cfg.plain_predicates
+    match standard.lower():
+        case "csv":
+            data = verify_plain_predicates_from_csv(data_path, plain_predicates)
+        case "meds":
+            data = generate_plain_predicates_from_meds(data_path, plain_predicates)
+        case "esgpt":
+            data = generate_plain_predicates_from_esgpt(data_path, plain_predicates)
+        case _:
+            raise ValueError(f"Invalid data standard: {standard}")
+    predicate_cols = list(plain_predicates.keys())

    # derived predicates
+    logger.info("Generating derived predicate columns...")
    for name, code in cfg.derived_predicates.items():
        data = data.with_columns(code.eval_expr().cast(pl.UInt16).alias(name))
-        logger.debug(f"Added predicate column '{name}'.")
+        logger.info(f"Added predicate column '{name}'.")
        predicate_cols.append(name)

    # add a column of 1s representing any predicate
+    logger.info("Generating '_ANY_EVENT' predicate column...")
    data = data.with_columns(pl.lit(1).alias(ANY_EVENT_COLUMN).cast(pl.UInt16))
-    logger.debug(f"Added predicate column '{ANY_EVENT_COLUMN}'.")
+    logger.info(f"Added predicate column '{ANY_EVENT_COLUMN}'.")
    predicate_cols.append(ANY_EVENT_COLUMN)

-    data = data.sort(by=["subject_id", "timestamp"]).select(["subject_id", "timestamp"] + predicate_cols)
-
-    return data
+    return data.sort(by=["subject_id", "timestamp"])


Tip
Codebase Verification

Ensure test coverage for the function generate_predicates_df. No test files or references to this function were found in the codebase.

generate_predicates_df is used in run.py but lacks direct test coverage.

Analysis chain

Ensure test coverage for the function generate_predicates_df.

Scripts Executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Verify test coverage for `generate_predicates_df`. # Test: Search for the function in test files. Expect: At least one occurrence. rg --type python $'generate_predicates_df' --glob '*test*.py'

Length of output: 91

Script:

#!/bin/bash # Description: Verify test coverage for `generate_predicates_df`. # Test: Search for the function in test files. Expect: At least one occurrence. rg --type py $'generate_predicates_df' --glob '*test*.py'

Length of output: 205

Script:

#!/bin/bash # Description: Diagnose why no files were searched and refine the search for test coverage. # Test: Use --debug to understand why files are being skipped. rg --type py $'generate_predicates_df' --glob '*test*.py' --debug

Length of output: 13632

Script:

#!/bin/bash # Description: Refine the search for test coverage by including hidden files and overriding ignore patterns. # Test: Search for the function in test files, including hidden files and overriding ignore patterns. rg --type py $'generate_predicates_df' --glob '*test*.py' --hidden --no-ignore

Length of output: 226

Script:

#!/bin/bash # Description: Broaden the search for test coverage by including all Python files. # Test: Search for the function in all Python files. rg --type py $'generate_predicates_df' --hidden --no-ignore

Length of output: 452

Script:

#!/bin/bash # Description: List all test files and check their contents for references to `generate_predicates_df`. # List all test files. fd --type f --glob '*test*.py' # Check contents of test files for references to `generate_predicates_df`. fd --type f --glob '*test*.py' --exec rg 'generate_predicates_df' {}

Length of output: 103

Move files

0846606

justin13601 self-assigned this May 22, 2024

Change to logger.info

613db2d