Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated populate fixing-investigation command #2500

Open
wants to merge 24 commits into
base: develop
Choose a base branch
from

Conversation

DraKen0009
Copy link
Contributor

@DraKen0009 DraKen0009 commented Sep 26, 2024

Proposed Changes

  • Fixed duplication of data in investigation group table
  • Fixed many to many relations when populating investigation data
  • Updated Investigation data and investigation group data

Associated Issue

Merge Checklist

  • Tests added/fixed
  • Linting Complete

Only PR's with test cases included and passing lint and test pipelines will be reviewed

@ohcnetwork/care-backend-maintainers @ohcnetwork/care-backend-admins

Summary by CodeRabbit

  • New Features

    • Introduced new investigation groups, expanding the medical testing categories available.
    • Added several new medical tests to enhance the breadth of investigations.
  • Improvements

    • Updated existing medical test parameters for accuracy, including ideal values and units.
    • Enhanced data loading processes for better efficiency and integrity in managing investigations and associated groups.
  • Bug Fixes

    • Resolved issues related to the relationship mapping between investigations and their groups.

@DraKen0009 DraKen0009 requested a review from a team as a code owner September 26, 2024 02:08
Copy link

codecov bot commented Oct 7, 2024

Codecov Report

Attention: Patch coverage is 80.76923% with 10 lines in your changes missing coverage. Please review.

Project coverage is 69.46%. Comparing base (d6d069e) to head (20b6461).

Files with missing lines Patch % Lines
...ers/management/commands/populate_investigations.py 80.76% 6 Missing and 4 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2500      +/-   ##
===========================================
+ Coverage    69.20%   69.46%   +0.25%     
===========================================
  Files          211      211              
  Lines        11944    11970      +26     
  Branches      1208     1213       +5     
===========================================
+ Hits          8266     8315      +49     
+ Misses        3302     3273      -29     
- Partials       376      382       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bodhish
Copy link
Member

bodhish commented Oct 18, 2024

@DraKen0009 do follow up and close this

@DraKen0009
Copy link
Contributor Author

@sainak updated, can you please run the workflows once

@sainak
Copy link
Member

sainak commented Oct 23, 2024

@DraKen0009 can you add a test case that validates the groups are being linked correctly

you can run the command using django.core.management.call_command

@DraKen0009
Copy link
Contributor Author

So basically after running the command , I need to loop over the data in json and compare them with the data present in the model (db)?

@sainak
Copy link
Member

sainak commented Oct 23, 2024

So basically after running the command , I need to loop over the data in json and compare them with the data present in the model (db)?

just check a few records

@DraKen0009
Copy link
Contributor Author

@sainak can you please cehck the test cases and let me know if they require any changes

Copy link
Member

@rithviknishad rithviknishad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm; found a tiny typo

care/facility/tests/test_management_commands.py Outdated Show resolved Hide resolved
Copy link

coderabbitai bot commented Nov 7, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The changes in this pull request introduce a new test class LoadInvestigationsCommandTest to validate the functionality of the populate_investigations management command. The command's logic for loading investigation data has been restructured for efficiency, including checks for existing records before creating new ones. Additionally, multiple new entries have been added to the investigation_groups.json file, and extensive updates have been made to the investigations.json file to reflect new medical standards, including category ID adjustments, unit changes, and the introduction of new tests.

Changes

File Path Change Summary
care/facility/tests/test_management_commands.py New test class LoadInvestigationsCommandTest added to validate populate_investigations command functionality.
care/users/management/commands/populate_investigations.py Restructured data loading logic, added checks for existing groups, updated handling of values, and improved relationship management.
data/investigation_groups.json Added multiple new investigation groups with unique identifiers and names.
data/investigations.json Extensive updates including category ID adjustments, unit changes, ideal value updates, new entries, and renaming for clarity.

Poem

In the land of tests and JSON files,
New groups and entries bring forth smiles.
Investigations now shine so bright,
With data restructured, all feels right.
So here's to the changes, a little cheer,
For clarity and order, we hold dear! 🎉


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (8)
data/investigation_groups.json (1)

29-80: Inconsistent capitalization in group names

Some observations about the naming conventions:

  • "Cerebro Spinal Fluid analysis" - "analysis" is not capitalized
  • "Gastric Analysis" - "Analysis" is capitalized
  • "Semen analysis" - "analysis" is not capitalized
  • "Stool Examination" - "Examination" is capitalized

Perhaps we should settle on one style? I'm leaning towards capitalizing all words in the group names.

-        "name": "Cerebro Spinal Fluid analysis"
+        "name": "Cerebro Spinal Fluid Analysis"
-        "name": "Semen analysis"
+        "name": "Semen Analysis"
care/facility/tests/test_management_commands.py (2)

21-25: Just counting entries? How... minimalist.

While the count validation is a good start, it might be nice to add some basic data integrity checks.

Consider adding validation for essential fields:

def test_data_integrity(self):
    for investigation in self.investigations:
        db_investigation = PatientInvestigation.objects.get(name=investigation["name"])
        self.assertEqual(db_investigation.unit, investigation.get("unit", ""))
        self.assertEqual(db_investigation.ideal_value, investigation.get("ideal_value", ""))

27-48: Testing just 10 entries? That's... brave.

While the dictionary optimization for group lookups is commendable, the sample size seems a bit small for a proper validation.

Consider either:

  1. Testing all entries (preferred for critical data)
  2. Using a more representative random sample:
-        # taking first and last 5 data to test it out
-        test_investigation_data = self.investigations[:5] + self.investigations[-5:]
+        import random
+        # Testing ~20% of entries with a minimum of 10
+        sample_size = max(10, len(self.investigations) // 5)
+        test_investigation_data = random.sample(self.investigations, sample_size)

Also, that informal comment could be more... professional.

-        # taking first and last 5 data to test it out
+        # Sampling subset of investigations for relationship validation
care/users/management/commands/populate_investigations.py (3)

17-21: Perhaps we could add some basic error handling?

While I'm sure you've tested this thoroughly, it might be nice to handle scenarios where these files don't exist, instead of letting it crash and burn. Just a thought!

-        with Path("data/investigations.json").open() as investigations_data:
-            investigations = json.load(investigations_data)
+        try:
+            with Path("data/investigations.json").open() as investigations_data:
+                investigations = json.load(investigations_data)
+        except (FileNotFoundError, json.JSONDecodeError) as e:
+            raise CommandError(f"Failed to load investigations.json: {e}")

51-58: These try-except blocks look awfully repetitive...

We could consolidate these similar try-except blocks into a helper function. You know, DRY and all that...

+        def safe_float_conversion(value):
+            try:
+                return float(value)
+            except (ValueError, TypeError, KeyError):
+                return None
+
-            try:
-                data["min_value"] = float(investigation["min"])
-            except (ValueError, TypeError, KeyError):
-                data["min_value"] = None
-            try:
-                data["max_value"] = float(investigation["max"])
-            except (ValueError, TypeError, KeyError):
-                data["max_value"] = None
+            data["min_value"] = safe_float_conversion(investigation.get("min"))
+            data["max_value"] = safe_float_conversion(investigation.get("max"))

102-105: The success message could be a tiny bit more informative...

Maybe we could tell users how many records were actually created/updated? Just a thought!

         if kwargs.get("verbosity", 1) > 0:
             self.stdout.write(
-                self.style.SUCCESS("Successfully populated investigation data")
+                self.style.SUCCESS(
+                    f"Successfully populated investigation data: "
+                    f"{len(bulk_create_data)} created, {len(bulk_update_data)} updated"
+                )
             )
data/investigations.json (2)

2277-2287: Fix multiline ideal value formatting

The Troponin ideal value contains Windows-style line endings (\r\n). For consistency, use Unix-style line endings.

-        "ideal": " Troponin I (cTnI) :0-0.08 ng/ml\r\n Ttroponin T (cTnT):0-0.01 ng/ml",
+        "ideal": "Troponin I (cTnI): 0-0.08 ng/ml\nTroponin T (cTnT): 0-0.01 ng/ml",

2743-2753: Fix trailing newlines in test name

The porphobilinogen test name contains multiple trailing newlines which could cause display issues.

-        "name": "porphobilinogen\n\n",
+        "name": "porphobilinogen",
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d6d069e and 3de9f41.

📒 Files selected for processing (4)
  • care/facility/tests/test_management_commands.py (1 hunks)
  • care/users/management/commands/populate_investigations.py (2 hunks)
  • data/investigation_groups.json (1 hunks)
  • data/investigations.json (127 hunks)
🔇 Additional comments (5)
data/investigation_groups.json (1)

29-80: Verify investigation group relationships

Let's ensure these new groups are properly linked to their respective investigations in the investigations.json file.

#!/bin/bash
# Check if all new group IDs (8-20) are referenced in investigations.json
echo "Checking references for new investigation groups..."
for id in {8..20}; do
    echo "Group $id:"
    rg -c "\"group\": *\"$id\"" investigations.json
done
care/facility/tests/test_management_commands.py (1)

1-8: Well-organized imports, I suppose.

The imports are clean and properly organized, which is... refreshing to see.

data/investigations.json (3)

708-710: Validate extreme range changes

Several tests have significantly increased max ranges:

  • GGT max increased to 300 U/L
  • ALT max increased to 1500 U/L
  • ALP max increased to 1200 U/L

While these could be valid for extreme clinical cases, it would be nice if someone could verify these ranges with medical professionals.

Consider adding a panic_range field to flag critically high values that require immediate attention.

Also applies to: 1736-1738, 1760-1762


1612-1614: Verify pediatric reference ranges

The ideal ranges for several tests (HBA1C, Haemoglobin, PCV/HCT) don't specify pediatric reference ranges. It would be really great if we could make these more comprehensive.

Consider adding age-specific ranges similar to how ALP ranges are specified.

Also applies to: 1623-1625, 1635-1637


29-30: Standardize unit notation for consistency

The unit notation has been changed from cell/cumm to cell/µl across multiple tests. While this is more accurate scientifically, ensure that all downstream systems and reports can properly handle the Unicode µ character. It would be slightly more robust to use cell/uL instead.

Also applies to: 42-43, 55-56, 68-69, 81-82

data/investigation_groups.json Outdated Show resolved Hide resolved
data/investigation_groups.json Show resolved Hide resolved
care/users/management/commands/populate_investigations.py Outdated Show resolved Hide resolved
care/users/management/commands/populate_investigations.py Outdated Show resolved Hide resolved
data/investigations.json Outdated Show resolved Hide resolved
data/investigations.json Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (8)
data/investigation_groups.json (3)

Line range hint 1-28: Ah, the creative approach to capitalization continues...

I notice we're being quite... flexible with our capitalization standards:

  • "Biochemistry test" (id: 2)
  • "Kidney Function test" (id: 7)
  • "Liver Function Test" (id: 6)

Perhaps we could align these to be consistently capitalized? Unless there's some secret medical convention I'm not aware of? 😊

    {
        "id": "2",
-        "name": "Biochemistry test"
+        "name": "Biochemistry Test"
    },
    {
        "id": "7",
-        "name": "Kidney Function test"
+        "name": "Kidney Function Test"
    }

70-73: Consistency in terminology would be lovely

I see we have some entries with lowercase "analysis" while others use proper test naming conventions:

  • "Semen analysis"
  • "Stool Examination"

Shall we make them all follow the same pattern?

    {
        "id": "18",
-        "name": "Semen analysis"
+        "name": "Semen Analysis"
    }

Also applies to: 74-77


39-41: Abbreviation without context

"ABG" is used without its full form. While medical professionals might know this stands for Arterial Blood Gas, it might be helpful to include the full form for clarity.

    {
        "id": "10",
-        "name": "ABG"
+        "name": "ABG (Arterial Blood Gas)"
    }
care/users/management/commands/populate_investigations.py (3)

10-12: Perhaps we could make this function a tiny bit more robust?

While the function works, it might be nice to add some basic error handling for those special moments when files don't exist or contain invalid JSON.

 def load_json(file_path):
-    with Path(file_path).open() as json_file:
-        return json.load(json_file)
+    try:
+        with Path(file_path).open() as json_file:
+            return json.load(json_file)
+    except (FileNotFoundError, json.JSONDecodeError) as e:
+        raise CommandError(f"Failed to load {file_path}: {str(e)}")

Line range hint 84-124: Maybe add some progress reporting for those special long-running imports?

While the success message at the end is nice, it might be helpful to know what's happening during those potentially long-running bulk operations.

     def handle(self, *args, **kwargs):
+        verbosity = kwargs.get("verbosity", 1)
         investigation_groups = load_json("data/investigation_groups.json")
+        if verbosity > 0:
+            self.stdout.write(f"Loaded {len(investigation_groups)} investigation groups")
         investigations = load_json("data/investigations.json")
+        if verbosity > 0:
+            self.stdout.write(f"Loaded {len(investigations)} investigations")

Line range hint 1-124: A dash of logging would make production issues so much easier to debug...

Consider adding proper logging throughout the command for better observability in production. This would help track data inconsistencies and performance issues.

Key areas to add logging:

  • Number of records created/updated
  • Time taken for bulk operations
  • Any data validation issues
  • Skipped records due to validation failures
data/investigations.json (2)

29-29: Standardize unicode character usage

The unit changes from cell/cumm to cell/µl use different unicode representations for the micro symbol:

  • Some use \u03bcl
  • Others might use the literal µ character

Consider standardizing to use the unicode escape sequence \u03bc consistently across all measurements.

Also applies to: 42-42, 55-55, 68-68, 81-81


Line range hint 1-11: Verify Blood Group categorization

The Blood Group test has an empty category_id array, which seems incorrect as it's a fundamental hematology test. Consider adding appropriate categories:

     "name": "Blood Group",
     "type": "Choice",
     "choices": "A-,A+,B+,B-,O+,O-,AB-,AB+",
     "unit": null,
     "ideal": null,
     "min": null,
     "max": null,
-    "category_id": []
+    "category_id": [1, 4]
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3de9f41 and 20b6461.

📒 Files selected for processing (3)
  • care/users/management/commands/populate_investigations.py (2 hunks)
  • data/investigation_groups.json (1 hunks)
  • data/investigations.json (128 hunks)
🔇 Additional comments (2)
care/users/management/commands/populate_investigations.py (2)

35-51: Well, this actually looks quite nice!

Clean separation of concerns and proper error handling in parse_float. I approve.


59-81: Would be lovely to have some validation here...

The function efficiently handles bulk operations, but it might be nice to validate that category_ids actually exist in investigation_group_dict before proceeding.

data/investigation_groups.json Show resolved Hide resolved
data/investigations.json Show resolved Hide resolved
data/investigations.json Show resolved Hide resolved
@DraKen0009
Copy link
Contributor Author

@sainak @rithviknishad Can you please re check. followed some of the code rabbit reviews and reformatted it to pass linting

@vigneshhari
Copy link
Member

Waiting on an ops confirmation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bugs in populate investigations command Expanding Investigations Seed list
5 participants