Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: cleaner data handling and improved logic for sample_qc_table.py (issue #324) #345

Merged

Conversation

jaamarks
Copy link
Collaborator

@jaamarks jaamarks commented Oct 8, 2024

This PR builds upon the changes introduced in PR #327 to further refine the sample_qc_table.py script and address issues related to sample retention.

What

  • Refactor: Removed the use of "Sample_ID" as the index in _retain_valid_discordant_replicates function.
  • Data Type Fixes: Resolved various datatype issues throughout the code.
  • Function Order Change: Reordered the _add_identifiler function to follow the _add_analytic_exclusion function.
  • Column Update: Replaced "is_cr2_filtered" column to the "is_call_rate_filtered" column in the _retain_valid_discordant_replicates function.
  • Test Updates: Updated the corresponding tests in tests/workflow/scripts/test_sample_qc_table.py. Added the "is_call_rate_filtered" column to the fake_sample_qc data fixture.

Why

  • Issues arose from using "Sample_ID" as the index in _retain_valid_discordant_replicates. Refactoring the code to eliminate this dependency resolves those issues.
  • Bugs related to datatypes were fixed by converting the variables to the correct data types.
  • Reordering the _add_identifiler function to follow the _add_analytic_exclusion function allows the _retain_valid_discordant_replicates function to run first, accommodating any status changes to "is_discordant_replicates".
  • Using the "is_call_rate_filtered" column instead of the "is_cr2_filtered" provides clearer indication of whether a sample failed either call rate filter.
  • Since we are now referencing the "is_call_rate_filtered" column in the _retain_valid_discordant_replicates function, we aligned our test data with this new expectation.

Fixes #324

- Improved data type handling
- Updated corresponding tests in ``test_sample_qc_table.py``

Overall logic of retaining valid discordant replicates remains the same,
but this code fixes some dtype issues with the last implementation.
@jaamarks jaamarks merged commit c618175 into NCI-CGR:default Oct 9, 2024
2 checks passed
@jaamarks jaamarks deleted the issue324-revisit-exp-rep-retainment branch October 9, 2024 13:51
@jaamarks jaamarks changed the title bugfix: cleaner data handling and improved logic for sample_qc_table.py (issue #324) fix: cleaner data handling and improved logic for sample_qc_table.py (issue #324) Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discordant expected dup removal - only apply to replicates pass call rate & contamination
1 participant