-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Final fixes #30
Final fixes #30
Conversation
WalkthroughThe recent updates involve transitioning logging levels from Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
result.csv
is excluded by!**/*.csv
Files selected for processing (6)
- run.py (1 hunks)
- src/esgpt_task_querying/config.py (1 hunks)
- src/esgpt_task_querying/constraints.py (1 hunks)
- src/esgpt_task_querying/extract_subtree.py (1 hunks)
- src/esgpt_task_querying/predicates.py (4 hunks)
- src/esgpt_task_querying/query.py (1 hunks)
Files skipped from review due to trivial changes (6)
- run.py
- src/esgpt_task_querying/config.py
- src/esgpt_task_querying/constraints.py
- src/esgpt_task_querying/extract_subtree.py
- src/esgpt_task_querying/predicates.py
- src/esgpt_task_querying/query.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- sample_configs/02-abnormal-lab.yaml (1 hunks)
- src/esgpt_task_querying/config.py (13 hunks)
- src/esgpt_task_querying/predicates.py (2 hunks)
Files skipped from review as they are similar to previous changes (2)
- src/esgpt_task_querying/config.py
- src/esgpt_task_querying/predicates.py
Additional comments not posted (5)
sample_configs/02-abnormal-lab.yaml (5)
10-10
: The definition ofabnormally_low_spo2
is clear and logically consistent.
14-17
: The definition ofabnormally_high_spo2
is clear and logically consistent.
18-19
: The definition ofabnormal_spo2
using an OR expression to combine low and high conditions is appropriate and clear.
19-19
: The use ofnormal_spo2
as the trigger event is appropriate and aligns with the defined normal range.
Line range hint
21-37
: The configuration of time windows for SpO2 monitoring is well-structured and clear.
Codecov ReportAttention: Patch coverage is
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Outside diff range and nitpick comments (2)
docs/source/terminology.md (2)
Line range hint
124-124
: Correct the heading level to maintain a logical structure.- #### Event + ### Event
Line range hint
101-101
: Specify the language for the fenced code block to enable syntax highlighting.- ```yaml + ```yaml
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (4)
- docs/source/configuration.md (1 hunks)
- docs/source/index.md (1 hunks)
- docs/source/terminology.md (1 hunks)
- src/esgpt_task_querying/extract_subtree.py (1 hunks)
Files skipped from review due to trivial changes (1)
- docs/source/index.md
Additional Context Used
Markdownlint (2)
docs/source/terminology.md (2)
124: Expected: h3; Actual: h4
Heading levels should only increment by one level at a time
101: null
Fenced code blocks should have a language specified
GitHub Check Runs (1)
codecov/patch failure (13)
src/esgpt_task_querying/extract_subtree.py: [warning] 257-257: src/esgpt_task_querying/extract_subtree.py#L257
Added line #L257 was not covered by tests
Additional comments not posted (2)
docs/source/configuration.md (1)
33-37
: Consider removing the extra spaces before the list items for consistency.- - `predicates`, stored as a dictionary from string predicate names (which must be unique) to either + - `predicates`, stored as a dictionary from string predicate names (which must be unique) to eitherLikely invalid or redundant comment.
src/esgpt_task_querying/extract_subtree.py (1)
257-257
: Ensure this newly added line is covered by unit tests to maintain code quality and prevent regressions.
system works by defining a configuration object that details the underlying concepts, inclusion/exclusion, and | ||
labeling criteria for the cohort/task to be extracted, then using a recursive algorithm to identify all | ||
realizations of valid patient time-ranges of data that satisfy those constraints from the raw data. For more | ||
details on the recursive algorithm, see the `terminology.md` file. **TODO** better integrate, name, and link |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment about integrating documentation files should be addressed or tracked in a project management tool.
Would you like me to open a GitHub issue to track this documentation integration task?
`value_min` (with these options being decided on the basis of `value_min_inclusive`, where | ||
`value_min_incusive=True` indicating that an observation satisfies this predicate if its value is greater | ||
than or equal to `value_min`, and `value_min_inclusive=False` indicating a greater than but not equal to | ||
will be used. | ||
- `value_max`: If specified, an observation will only satisfy this predicate if the occurrence of the | ||
underlying `code` with a reported numerical value that is either less than or less than or equal to | ||
`value_max` (with these options being decided on the basis of `value_max_inclusive`, where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider simplifying the phrase "on the basis of" to "based on" for clarity and conciseness.
- (with these options being decided on the basis of `value_min_inclusive`, where
+ (with these options being based on `value_min_inclusive`, where
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
`value_min` (with these options being decided on the basis of `value_min_inclusive`, where | |
`value_min_incusive=True` indicating that an observation satisfies this predicate if its value is greater | |
than or equal to `value_min`, and `value_min_inclusive=False` indicating a greater than but not equal to | |
will be used. | |
- `value_max`: If specified, an observation will only satisfy this predicate if the occurrence of the | |
underlying `code` with a reported numerical value that is either less than or less than or equal to | |
`value_max` (with these options being decided on the basis of `value_max_inclusive`, where | |
`value_min` (with these options being based on `value_min_inclusive`, where | |
`value_min_incusive=True` indicating that an observation satisfies this predicate if its value is greater | |
than or equal to `value_min`, and `value_min_inclusive=False` indicating a greater than but not equal to | |
will be used. | |
- `value_max`: If specified, an observation will only satisfy this predicate if the occurrence of the | |
underlying `code` with a reported numerical value that is either less than or less than or equal to | |
`value_max` (with these options being decided on the basis of `value_max_inclusive`, where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
sample_data/sample.csv
is excluded by!**/*.csv
Files selected for processing (8)
- run.py (2 hunks)
- sample_configs/01-inhospital-mortality.yaml (1 hunks)
- sample_configs/02-abnormal-lab.yaml (1 hunks)
- sample_configs/06-long_term-incidence.yaml (1 hunks)
- src/esgpt_task_querying/constraints.py (1 hunks)
- src/esgpt_task_querying/predicates.py (1 hunks)
- src/esgpt_task_querying/query.py (2 hunks)
- src/esgpt_task_querying/utils.py (2 hunks)
Files skipped from review as they are similar to previous changes (4)
- run.py
- sample_configs/02-abnormal-lab.yaml
- src/esgpt_task_querying/constraints.py
- src/esgpt_task_querying/query.py
Additional comments not posted (5)
sample_configs/01-inhospital-mortality.yaml (1)
19-20
: The addition of the_ANY_EVENT
condition in thewindows
configuration seems appropriate for inhospital mortality scenarios.sample_configs/06-long_term-incidence.yaml (1)
6-27
: Renaming predicates to include specific ICD codes and adding a combined predicate for myocardial infarction enhances clarity and specificity.src/esgpt_task_querying/utils.py (2)
40-56
: The implementation ofcapture_output
andlog_tree
functions is well-done, providing a robust way to capture and log output.
1-7
: The addition of new imports for output capturing and tree logging is appropriate for enhanced debugging and logging capabilities.src/esgpt_task_querying/predicates.py (1)
13-90
: Updating logging levels fromdebug
toinfo
and adding new functions for generating predicates from different data sources improve visibility and functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- src/esgpt_task_querying/predicates.py (2 hunks)
Additional Context Used
GitHub Check Runs (1)
codecov/patch failure (30)
src/esgpt_task_querying/predicates.py: [warning] 13-14: src/esgpt_task_querying/predicates.py#L13-L14
Added lines #L13 - L14 were not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 19-24: src/esgpt_task_querying/predicates.py#L19-L24
Added lines #L19 - L24 were not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 27-28: src/esgpt_task_querying/predicates.py#L27-L28
Added lines #L27 - L28 were not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 32-33: src/esgpt_task_querying/predicates.py#L32-L33
Added lines #L32 - L33 were not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 38-41: src/esgpt_task_querying/predicates.py#L38-L41
Added lines #L38 - L41 were not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 44-46: src/esgpt_task_querying/predicates.py#L44-L46
Added lines #L44 - L46 were not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 50-53: src/esgpt_task_querying/predicates.py#L50-L53
Added lines #L50 - L53 were not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 60-63: src/esgpt_task_querying/predicates.py#L60-L63
Added lines #L60 - L63 were not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 69-71: src/esgpt_task_querying/predicates.py#L69-L71
Added lines #L69 - L71 were not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 73-76: src/esgpt_task_querying/predicates.py#L73-L76
Added lines #L73 - L76 were not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 78-79: src/esgpt_task_querying/predicates.py#L78-L79
Added lines #L78 - L79 were not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 82-82: src/esgpt_task_querying/predicates.py#L82
Added line #L82 was not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 84-84: src/esgpt_task_querying/predicates.py#L84
Added line #L84 was not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 87-87: src/esgpt_task_querying/predicates.py#L87
Added line #L87 was not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 96-96: src/esgpt_task_querying/predicates.py#L96
Added line #L96 was not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 99-100: src/esgpt_task_querying/predicates.py#L99-L100
Added lines #L99 - L100 were not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 118-119: src/esgpt_task_querying/predicates.py#L118-L119
Added lines #L118 - L119 were not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 122-132: src/esgpt_task_querying/predicates.py#L122-L132
Added lines #L122 - L132 were not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 135-135: src/esgpt_task_querying/predicates.py#L135
Added line #L135 was not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 138-138: src/esgpt_task_querying/predicates.py#L138
Added line #L138 was not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 142-142: src/esgpt_task_querying/predicates.py#L142
Added line #L142 was not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 144-144: src/esgpt_task_querying/predicates.py#L144
Added line #L144 was not covered by tests
src/esgpt_task_querying/predicates.py: [warning] 147-147: src/esgpt_task_querying/predicates.py#L147
Added line #L147 was not covered by tests
Additional comments not posted (3)
src/esgpt_task_querying/predicates.py (3)
49-100
: Ensure test coverage for the functiongenerate_plain_predicates_from_esgpt
.
12-28
: Ensure test coverage for the functionverify_plain_predicates_from_csv
.
31-46
: Ensure test coverage for the functiongenerate_plain_predicates_from_meds
.
"""Generate predicate columns based on the configuration. | ||
|
||
Args: | ||
cfg: The TaskExtractorConfig object containing the predicates information. | ||
df_data: The Polars DataFrame containing the original external data. | ||
data_path: Path to external data (file path to .csv or .parquet, or ESGPT directory) as | ||
string or Path. | ||
standard: The data standard, either 'CSV, 'MEDS' or 'ESGPT'. | ||
|
||
Returns: | ||
predicates_df: The Polars DataFrame with the added predicate columns. | ||
|
||
Raises: | ||
ValueError: If an invalid predicate type is specified in the configuration. | ||
|
||
Examples: TODO | ||
""" | ||
logger.debug("Generating predicate columns...") | ||
predicate_cols = [] | ||
if isinstance(data_path, str): | ||
data_path = Path(data_path) | ||
|
||
# plain predicates | ||
match standard: | ||
case "CSV": | ||
for name, plain_predicate in cfg.plain_predicates.items(): | ||
data = data.with_columns( | ||
plain_predicate.ESGPT_eval_expr(plain_predicate.values_column).cast(pl.UInt16).alias(name) | ||
) | ||
logger.debug(f"Added predicate column '{name}'.") | ||
predicate_cols.append(name) | ||
case "MEDS": | ||
for name, plain_predicate in cfg.plain_predicates.items(): | ||
data = data.with_columns(plain_predicate.MEDS_eval_expr().alias(name)) | ||
logger.debug(f"Added predicate column '{name}'.") | ||
predicate_cols.append(name) | ||
case "ESGPT": | ||
for name, plain_predicate in cfg.plain_predicates.items(): | ||
if "event_type" in plain_predicate.code: | ||
data[0] = data[0].with_columns( | ||
plain_predicate.ESGPT_eval_expr(plain_predicate.values_column) | ||
.cast(pl.UInt16) | ||
.alias(name) | ||
) | ||
else: | ||
data[1] = data[1].with_columns( | ||
plain_predicate.ESGPT_eval_expr(plain_predicate.values_column) | ||
.cast(pl.UInt16) | ||
.alias(name) | ||
) | ||
logger.debug(f"Added predicate column '{name}'.") | ||
predicate_cols.append(name) | ||
|
||
# aggregate measurements (data[1]) by summing columns that are in count_cols, and taking the max | ||
# for columns in boolean_cols -> new ver only maxing | ||
data[1] = ( | ||
data[1] | ||
.group_by(["event_id"]) | ||
.agg( | ||
*[pl.col(c).max().cast(pl.Int64) for c in data[1].columns if c in predicate_cols], | ||
) | ||
) | ||
|
||
data = data[0].join(data[1], on="event_id", how="left") | ||
data = data.select( | ||
"subject_id", | ||
"timestamp", | ||
*[pl.col(c) for c in data.columns if c in predicate_cols], | ||
) | ||
plain_predicates = cfg.plain_predicates | ||
match standard.lower(): | ||
case "csv": | ||
data = verify_plain_predicates_from_csv(data_path, plain_predicates) | ||
case "meds": | ||
data = generate_plain_predicates_from_meds(data_path, plain_predicates) | ||
case "esgpt": | ||
data = generate_plain_predicates_from_esgpt(data_path, plain_predicates) | ||
case _: | ||
raise ValueError(f"Invalid data standard: {standard}") | ||
predicate_cols = list(plain_predicates.keys()) | ||
|
||
# derived predicates | ||
logger.info("Generating derived predicate columns...") | ||
for name, code in cfg.derived_predicates.items(): | ||
data = data.with_columns(code.eval_expr().cast(pl.UInt16).alias(name)) | ||
logger.debug(f"Added predicate column '{name}'.") | ||
logger.info(f"Added predicate column '{name}'.") | ||
predicate_cols.append(name) | ||
|
||
# add a column of 1s representing any predicate | ||
logger.info("Generating '_ANY_EVENT' predicate column...") | ||
data = data.with_columns(pl.lit(1).alias(ANY_EVENT_COLUMN).cast(pl.UInt16)) | ||
logger.debug(f"Added predicate column '{ANY_EVENT_COLUMN}'.") | ||
logger.info(f"Added predicate column '{ANY_EVENT_COLUMN}'.") | ||
predicate_cols.append(ANY_EVENT_COLUMN) | ||
|
||
data = data.sort(by=["subject_id", "timestamp"]).select(["subject_id", "timestamp"] + predicate_cols) | ||
|
||
return data | ||
return data.sort(by=["subject_id", "timestamp"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip
Codebase Verification
Ensure test coverage for the function generate_predicates_df
. No test files or references to this function were found in the codebase.
generate_predicates_df
is used inrun.py
but lacks direct test coverage.
Analysis chain
Ensure test coverage for the function generate_predicates_df
.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify test coverage for `generate_predicates_df`.
# Test: Search for the function in test files. Expect: At least one occurrence.
rg --type python $'generate_predicates_df' --glob '*test*.py'
Length of output: 91
Script:
#!/bin/bash
# Description: Verify test coverage for `generate_predicates_df`.
# Test: Search for the function in test files. Expect: At least one occurrence.
rg --type py $'generate_predicates_df' --glob '*test*.py'
Length of output: 205
Script:
#!/bin/bash
# Description: Diagnose why no files were searched and refine the search for test coverage.
# Test: Use --debug to understand why files are being skipped.
rg --type py $'generate_predicates_df' --glob '*test*.py' --debug
Length of output: 13632
Script:
#!/bin/bash
# Description: Refine the search for test coverage by including hidden files and overriding ignore patterns.
# Test: Search for the function in test files, including hidden files and overriding ignore patterns.
rg --type py $'generate_predicates_df' --glob '*test*.py' --hidden --no-ignore
Length of output: 226
Script:
#!/bin/bash
# Description: Broaden the search for test coverage by including all Python files.
# Test: Search for the function in all Python files.
rg --type py $'generate_predicates_df' --hidden --no-ignore
Length of output: 452
Script:
#!/bin/bash
# Description: List all test files and check their contents for references to `generate_predicates_df`.
# List all test files.
fd --type f --glob '*test*.py'
# Check contents of test files for references to `generate_predicates_df`.
fd --type f --glob '*test*.py' --exec rg 'generate_predicates_df' {}
Length of output: 103
Summary by CodeRabbit
New Features
has: _ANY_EVENT: (5, None)
ininhospital-mortality.yaml
.abnormally_high_spo2
and redefinedabnormal_spo2
as a combination of low and high SpO2 conditions inabnormal-lab.yaml
.Updates
long_term-incidence.yaml
to specific ICD9CM and ICD10CM codes.Enhancements
debug
toinfo
for better visibility of actions and processes across multiple modules and functions.Documentation
configuration.md
.index.md
andterminology.md
for clarity.