Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline predicate/analysis workflow #88

Merged
merged 22 commits into from
Nov 13, 2023
Merged

Streamline predicate/analysis workflow #88

merged 22 commits into from
Nov 13, 2023

Conversation

ielis
Copy link
Member

@ielis ielis commented Oct 23, 2023

Fixes #87 , #92

Depends on #94

@ielis
Copy link
Member Author

ielis commented Oct 26, 2023

@lnrekerle

I'm proposing a revamp to the CohortAnalyzer.

The CohortAnalyzer is an abstraction - a promise what CohortAnalyzer can do for the user. To get CohortAnalyzer we use a similar pattern to configuring PhenopacketPatientCreator. There is a config method that will give you CohortAnalyzer:

from genophenocorr.analysis import configure_cohort_analysis

analysis = configure_cohort_analysis(cohort, hpo)

You'll get an analysis with default options. If you want to tweak the options, build the CohortAnalysisConfiguration:

from genophenocorr.analysis import CohortAnalysisConfiguration

configuration = CohortAnalysisConfiguration.builder()
  .include_sv(True)
  .pval_correction('fdr_bh')
  .build()

analysis = configure_cohort_analysis(cohort, hpo, configuration)

Then we run the analysis, e.g. to compare MISSENSE vs others:

from genophenocorr.model import VariantEffect
from genophenocorr.analysis.predicate import BooleanPredicate

results = analysis.compare_by_variant_effect(VariantEffect.MISSENSE_VARIANT, tx_id='NM_1234.5')
result_df = results.summarize(hpo, BooleanPredicate.YES)
result_df.head()

We get results, a container with a lot of data. We call summarize to prepare a data frame with phenotypes vs. genotypes, ordered by corrected p values.

Note that we provide BooleanPredicate.YES to show genotype-phenotype correlation for present HPO terms, not for not-present (we would use BooleanPredicate.NO to show those).

This is what the PR adds. Thanks to the changes, we have a general framework for applying genotype and phenotype predicates and showing the results.

Please check out the code, try it out and we can discuss in greater detail the next time.

@ielis ielis marked this pull request as ready for review October 26, 2023 16:50
@ielis ielis linked an issue Oct 26, 2023 that may be closed by this pull request
ielis added 4 commits November 1, 2023 22:05
# Conflicts:
#	src/genophenocorr/analysis/predicate/_all_predicates.py
#	src/genophenocorr/model/_cohort.py
#	src/genophenocorr/model/_variant.py
…e of `ProteinMetadata.get_features_variant_overlaps()`.
@ielis
Copy link
Member Author

ielis commented Nov 2, 2023

Now, with the develop merged into the PR branch, we should be OK to move forward with this PR if the code looks good.

@ielis ielis merged commit f588591 into develop Nov 13, 2023
4 checks passed
@ielis ielis deleted the work-on-predicates branch November 13, 2023 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Review CohortAnalysis._remove_low_hpo_terms Streamline predicate/analysis workflow
1 participant