Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskExtractorConfig accepts optional predicates_path to split predica… #82

Merged
merged 5 commits into from
Aug 5, 2024

Conversation

justin13601
Copy link
Owner

@justin13601 justin13601 commented Jul 29, 2024

…tes from main config_path (#42)

Summary by CodeRabbit

  • New Features

    • Expanded the configuration loading functionality to support an optional predicates file for enhanced data handling.
    • Introduced filtering capabilities based on static patient demographics to improve data processing.
    • Enhanced data processing functions to accommodate additional demographic columns, improving analysis granularity.
  • Bug Fixes

    • Improved error handling for invalid predicates file paths and formats, providing clearer feedback for users.
    • Updated test scenarios to better reflect the new data structure and ensure accurate outcomes.

Copy link
Contributor

coderabbitai bot commented Jul 29, 2024

Walkthrough

The recent changes enhance the functionality of the codebase, particularly by improving the load method of the TaskExtractorConfig class in src/aces/config.py. An optional predicates_path parameter has been introduced for specifying a predicates YAML file, enriching configuration management. Additionally, new functions and updates to data structures support more nuanced analysis of patient demographics and static variables, improving overall data integrity and usability.

Changes

Files Change Summary
src/aces/config.py Updated load method to accept an optional predicates_path with validation checks and improved error handling. Added a static attribute to related classes.
sample_configs/inhospital_mortality.yaml Added patient_demographics section with a male demographic, enhancing patient data categorization.
src/aces/constraints.py Introduced check_static_variables function to filter DataFrame based on static demographics, enhancing data integrity and analysis.
src/aces/predicates.py Enhanced functions to support new demographic columns and updated logic for processing, including static predicates in process_esgpt_data.
src/aces/query.py Modified query function to incorporate filtering based on static variables, improving data processing capabilities.
tests/test_e2e.py Updated test data structure to include demographic information and adjusted expected outputs accordingly.
tests/test_check_static_variables.py New test suite for check_static_variables function to ensure accurate filtering and robust error handling.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant TaskExtractorConfig
    participant YAMLLoader

    User->>TaskExtractorConfig: load(config_path, predicates_path)
    TaskExtractorConfig->>YAMLLoader: validate(predicates_path)
    alt If predicates_path is valid
        YAMLLoader-->>TaskExtractorConfig: load predicates from predicates_path
        TaskExtractorConfig->>User: return TaskExtractorConfig with new predicates
    else If predicates_path is invalid
        TaskExtractorConfig->>User: raise error
    end
    TaskExtractorConfig->>YAMLLoader: load config from config_path
    TaskExtractorConfig->>User: return TaskExtractorConfig
Loading

Poem

🐰 In the meadow, changes dance,
Configs twirl in a joyful prance.
With predicates now in the mix,
Our tasks align, a clever fix!
So hop along, let’s celebrate,
Robust loading, isn’t it great? 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

codecov bot commented Jul 29, 2024

Codecov Report

Attention: Patch coverage is 90.47619% with 6 lines in your changes missing coverage. Please review.

Files Patch % Lines
src/aces/config.py 94.11% 2 Missing ⚠️
src/aces/predicates.py 83.33% 2 Missing ⚠️
src/aces/query.py 71.42% 2 Missing ⚠️
Files Coverage Δ
src/aces/constraints.py 100.00% <100.00%> (ø)
src/aces/config.py 95.34% <94.11%> (+7.47%) ⬆️
src/aces/predicates.py 82.94% <83.33%> (-0.12%) ⬇️
src/aces/query.py 88.63% <71.42%> (-3.47%) ⬇️

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 3817bd4 and f8930ae.

Files selected for processing (1)
  • src/aces/config.py (2 hunks)
Additional comments not posted (7)
src/aces/config.py (7)

811-811: Method signature update.

The method signature has been updated to accept an additional optional parameter predicates_path. This change enhances the method's flexibility by allowing the user to specify a separate path for predicates.


835-836: Check if predicates_path is provided.

The code correctly checks if predicates_path is provided before proceeding with further logic.


836-837: Convert predicates_path to Path object.

The code correctly converts predicates_path to a Path object if it is a string.


839-842: Check if predicates_path exists and is a file.

The code correctly checks if predicates_path exists and is a file. If not, it raises a FileNotFoundError.


844-850: Load predicates file and check file extension.

The code correctly loads the predicates file if it has a .yaml extension. If the extension is not .yaml, it raises a ValueError.


852-860: Extract and validate predicates from the loaded file.

The code extracts the predicates from the loaded file and removes the description key if it exists. It also raises a ValueError if there are unrecognized keys in the configuration file.


862-863: Default to extracting predicates from the main config file.

If predicates_path is not provided, the code defaults to extracting predicates from the main configuration file.

* Make windows struct optional when loading config

* Update spacing
@justin13601 justin13601 self-assigned this Aug 2, 2024
* Added patient_demographics and parsing of static variables (#48)

* Added filtering based on static predicates (#48)

* Fix tests and column renames to match MEDS

* Update e2e test
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Outside diff range, codebase verification and nitpick comments (2)
src/aces/config.py (2)

868-870: Default handling of predicates when predicates_path is not provided

The current tests in tests/test_e2e.py do not explicitly cover the scenario where predicates_path is not provided. To ensure robust testing, please add a test case that verifies the default handling of predicates when predicates_path is not set.

  • Add a test case where predicates_path is not provided and verify the behavior.
Analysis chain

Default handling of predicates when predicates_path is not provided

The default handling of predicates when predicates_path is not provided is consistent with the previous logic. Ensure that this path is also covered by tests.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify if the default handling of predicates when `predicates_path` is not provided is covered by tests.

# Test: Search for test cases that cover the default logic. Expect: Test cases covering the default logic.
rg --type python -A 5 'load\(.*config_path'

Length of output: 76


Script:

#!/bin/bash
# Description: Verify if the default handling of predicates when `predicates_path` is not provided is covered by tests.

# Find all Python test files and search for test cases that cover the default logic.
fd --extension py --glob '*test*' | xargs rg 'load\(.*config_path'

Length of output: 68


Script:

#!/bin/bash
# Description: Broaden the search to verify if the default handling of predicates when `predicates_path` is not provided is covered by tests.

# Find all Python test files and search for references to `predicates` and `predicates_path`.
fd --extension py --glob '*test*' | xargs rg 'predicates|predicates_path'

Length of output: 399


Script:

#!/bin/bash
# Description: Extract relevant portions of `tests/test_e2e.py` to analyze the test cases involving `predicates`.

# Display lines around the references to `predicates` in `tests/test_e2e.py`.
rg 'predicates|predicates_path' -A 10 -B 10 tests/test_e2e.py

Length of output: 2196


840-867: Tests missing for predicates_path handling

The new logic for handling the predicates_path parameter is not covered by tests. Please add test cases to ensure this functionality is properly validated.

  • No references to predicates_path were found in any test files.
Analysis chain

Handling of predicates_path parameter

The handling of the predicates_path parameter includes checking if the path is valid and loading the predicates from the specified file. The error handling is appropriate, raising exceptions for missing files or incorrect file types. Ensure that this new logic is covered by tests.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify if the new logic for handling `predicates_path` is covered by tests.

# Test: Search for test cases that cover the new logic. Expect: Test cases covering the new logic.
rg --type python -A 5 'load\(.*predicates_path'

Length of output: 80


Script:

#!/bin/bash
# Description: Verify if the new logic for handling `predicates_path` is covered by tests.

# Test: Search for test cases that cover the new logic. Expect: Test cases covering the new logic.
rg --type-add 'py:*.py' -t py -A 5 'predicates_path'

Length of output: 1690


Script:

#!/bin/bash
# Description: Verify if the new logic for handling `predicates_path` is covered by tests.

# Search for references to `predicates_path` within typical test directories.
rg --type-add 'py:*.py' -t py -A 5 'predicates_path' tests test

Length of output: 108


Script:

#!/bin/bash
# Description: Verify if the new logic for handling `predicates_path` is covered by tests.

# Locate all Python test files and search for references to `predicates_path`.
fd --type f --extension py | xargs rg 'predicates_path'

Length of output: 831

Tools
GitHub Check: codecov/patch

[warning] 841-842: src/aces/config.py#L841-L842
Added lines #L841 - L842 were not covered by tests


[warning] 844-845: src/aces/config.py#L844-L845
Added lines #L844 - L845 were not covered by tests


[warning] 849-851: src/aces/config.py#L849-L851
Added lines #L849 - L851 were not covered by tests


[warning] 853-853: src/aces/config.py#L853
Added line #L853 was not covered by tests


[warning] 857-858: src/aces/config.py#L857-L858
Added lines #L857 - L858 were not covered by tests


[warning] 861-861: src/aces/config.py#L861
Added line #L861 was not covered by tests


[warning] 863-864: src/aces/config.py#L863-L864
Added lines #L863 - L864 were not covered by tests

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between f8930ae and ad6f312.

Files ignored due to path filters (1)
  • sample_data/sample_data.csv is excluded by !**/*.csv
Files selected for processing (6)
  • sample_configs/inhospital_mortality.yaml (1 hunks)
  • src/aces/config.py (8 hunks)
  • src/aces/constraints.py (1 hunks)
  • src/aces/predicates.py (17 hunks)
  • src/aces/query.py (2 hunks)
  • tests/test_e2e.py (8 hunks)
Additional context used
GitHub Check: codecov/patch
src/aces/query.py

[warning] 65-66: src/aces/query.py#L65-L66
Added lines #L65 - L66 were not covered by tests

src/aces/constraints.py

[warning] 152-152: src/aces/constraints.py#L152
Added line #L152 was not covered by tests

src/aces/predicates.py

[warning] 501-501: src/aces/predicates.py#L501
Added line #L501 was not covered by tests


[warning] 514-514: src/aces/predicates.py#L514
Added line #L514 was not covered by tests

src/aces/config.py

[warning] 69-69: src/aces/config.py#L69
Added line #L69 was not covered by tests


[warning] 841-842: src/aces/config.py#L841-L842
Added lines #L841 - L842 were not covered by tests


[warning] 844-845: src/aces/config.py#L844-L845
Added lines #L844 - L845 were not covered by tests


[warning] 849-851: src/aces/config.py#L849-L851
Added lines #L849 - L851 were not covered by tests


[warning] 853-853: src/aces/config.py#L853
Added line #L853 was not covered by tests


[warning] 857-858: src/aces/config.py#L857-L858
Added lines #L857 - L858 were not covered by tests


[warning] 861-861: src/aces/config.py#L861
Added line #L861 was not covered by tests


[warning] 863-864: src/aces/config.py#L863-L864
Added lines #L863 - L864 were not covered by tests


[warning] 898-899: src/aces/config.py#L898-L899
Added lines #L898 - L899 were not covered by tests

Additional comments not posted (21)
sample_configs/inhospital_mortality.yaml (1)

12-14: LGTM!

The addition of patient_demographics with the male demographic is correct and logically integrates within the predicates section.

src/aces/query.py (2)

12-12: LGTM!

The import check_static_variables from the constraints module is necessary for the new logic introduced in the query function.


60-66: LGTM!

The new logic for static variable filtering in the query function is correct and logically integrates within the existing structure.

Tools
GitHub Check: codecov/patch

[warning] 65-66: src/aces/query.py#L65-L66
Added lines #L65 - L66 were not covered by tests

src/aces/constraints.py (1)

103-167: LGTM!

The new function check_static_variables correctly filters a DataFrame based on static patient demographics and logically integrates within the existing structure.

Tools
GitHub Check: codecov/patch

[warning] 152-152: src/aces/constraints.py#L152
Added line #L152 was not covered by tests

tests/test_e2e.py (6)

19-19: LGTM!

The constant EVENT_INDEX_TYPE is correctly set to pl.UInt64.


21-21: LGTM!

The constant LAST_EVENT_INDEX_COLUMN is correctly set to "_LAST_EVENT_INDEX".


26-74: LGTM!

The input data format changes, including the male and female columns, are consistent with the enhanced data schema to better capture patient demographics.


97-100: LGTM!

The addition of the patient_demographics field and the male predicate enhances the task configuration to handle demographic data.


133-135: LGTM!

The expected output changes are consistent with the updated input data and task configuration.


299-303: LGTM!

The changes in the data type casting logic to accommodate the new EVENT_INDEX_TYPE ensure that the event indices are handled correctly.

src/aces/predicates.py (8)

38-50: LGTM!

The function direct_load_plain_predicates has been correctly updated to include additional columns: is_male, is_female, and brown_eyes.


203-203: LGTM!

The data selection logic in direct_load_plain_predicates correctly includes the new columns.


295-305: LGTM!

The function generate_plain_predicates_from_meds has been correctly updated to accommodate a new column for male in the predicate configuration.


319-319: LGTM!

The data loading logic in generate_plain_predicates_from_meds correctly handles the new columns.


341-341: LGTM!

The function process_esgpt_data has been correctly updated to include a new parameter subjects_df and enhance the data processing logic.


421-424: LGTM!

The logic for handling static predicates in process_esgpt_data correctly integrates the new parameter subjects_df.


536-554: LGTM!

The method get_predicates_df has been correctly updated to handle the new predicates_path parameter and enhance the data processing logic.


Line range hint 700-732:
LGTM!

The logic for generating special predicate columns in get_predicates_df correctly integrates the new changes.

src/aces/config.py (3)

36-36: Addition of static attribute in PlainPredicateConfig

The addition of the static attribute to the PlainPredicateConfig class is clear and consistent with the class's purpose. Ensure that this attribute is properly utilized in the rest of the codebase.


183-183: Addition of static attribute in DerivedPredicateConfig

The addition of the static attribute to the DerivedPredicateConfig class is clear and consistent with the class's purpose. Ensure that this attribute is properly utilized in the rest of the codebase.


816-816: Update method signature to include predicates_path

The method signature has been updated to include an optional predicates_path parameter. This change is clear and consistent with the PR objectives.

src/aces/query.py Show resolved Hide resolved
src/aces/constraints.py Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between ad6f312 and 528a698.

Files selected for processing (2)
  • src/aces/config.py (9 hunks)
  • tests/test_check_static_variables.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • src/aces/config.py
Additional comments not posted (4)
tests/test_check_static_variables.py (4)

1-7: Imports look good.

The imports include necessary standard libraries, third-party libraries, and a local module.


9-29: Test data setup looks comprehensive.

The sample DataFrame includes various columns and data types, covering different scenarios.


31-47: Test cases look well-designed.

The test cases cover filtering based on a demographic column and checking for a ValueError when a column is missing.

Consider adding tests for additional edge cases, such as:

  1. Empty DataFrame.
  2. DataFrame with missing values in the demographic columns.
  3. Multiple demographic columns.

54-55: Main block is appropriate.

The main block correctly runs the tests using pytest.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (1)
src/aces/config.py (1)

1021-1021: Update the docstring to reflect the new parameter.

The docstring should include the predicates_path parameter to maintain clarity and completeness.

        """
+        Args:
+            config_path: The path to which a configuration object will be read from in YAML form.
+            predicates_path: The path to an optional predicates configuration file in YAML form.
+
        Raises:
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 528a698 and 436a13d.

Files selected for processing (1)
  • src/aces/config.py (14 hunks)
Additional context used
GitHub Check: codecov/patch
src/aces/config.py

[warning] 1104-1105: src/aces/config.py#L1104-L1105
Added lines #L1104 - L1105 were not covered by tests

Additional comments not posted (3)
src/aces/config.py (3)

36-36: LGTM!

The addition of the static attribute is well-integrated.


210-210: LGTM!

The addition of the static attribute is well-integrated.


1045-1075: LGTM! But verify the function usage in the codebase.

The code changes are approved.

However, ensure that all function calls to TaskExtractorConfig.load match the new signature.

Verification successful

Function usage matches the new signature

The function TaskExtractorConfig.load has been verified to match the new signature with an optional predicates_path parameter. The usage in src/aces/__main__.py correctly calls this function without the optional parameter, which is acceptable.

  • src/aces/config.py: Verified function signature.
  • src/aces/__main__.py: Verified function usage.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all function calls to `TaskExtractorConfig.load` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type python -A 5 $'TaskExtractorConfig.load'

Length of output: 79


Script:

#!/bin/bash
# Description: Verify all function calls to `TaskExtractorConfig.load` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type py -A 5 $'TaskExtractorConfig.load'

Length of output: 2928


Script:

#!/bin/bash
# Verify the function signature in src/aces/config.py
rg --type py -A 10 'def load' src/aces/config.py

# Verify the usage in src/aces/__main__.py
rg --type py -A 5 'TaskExtractorConfig.load' src/aces/__main__.py

Length of output: 860

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant