-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestions on the initial toolbox PR #5
Conversation
- Add a description of the repo structure to the README - Add some potential requirements - Update the documentation and make sure it works - Create a notebook specific to loading gnomAD release data and just showing what each dataset looks like.
- Add a description of the repo structure to the README - Add some potential requirements - Update the documentation and make sure it works - Create a notebook specific to loading gnomAD release data and just showing what each dataset looks like.
- Addition of variant.py to store some functions that can also be used by frequency based filtering - Changes to frequency filtering to use new default settings if not passed a ht
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sending these over for now because I need 1 of them for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making this PR, your new structure of code makes more sense!
I think we can merge yours after some changes and I could work on the to-dos.
│ ├── intro_to_release_data.ipynb # Jupyter notebook introducing the loading of gnomAD release data. | ||
``` | ||
|
||
# TODO: Add fully detailed info about how to install and open the notebooks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These steps should include:
- install miniconda;
- set up a conda env for a specific version of Hail (and update JAVA);
- pip install gnomad_toolbox;
- set up the service account;
I can write this part once your PR is merged into mine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, but don't add anything yet. First step is to get a bunch of tickets in for todo's and we can decide who will tackle what
hl.utils.warning( | ||
f"No variant found at {variant.locus} with alleles {variant.alleles}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This warning is not working. Should we use import warnings
instead?
Could be:
warnings.warn(
f"No variant found at {variant.locus} with alleles {variant.alleles}",
RuntimeWarning,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really? I got the warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's just a hail/notebook thing.
# ) | ||
# ) | ||
|
||
# TODO: Consider this alternative approach to get the intervals from gencode. That |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your way is good, only a tiny problem, the gencode the Browser team used was preprocessed, I think they filtered out genes with the same ENSG that appear on both chrX and chrY (46 of them: 26 PAR, overlapping locus etc). We will get a different number of variants for those genes, e.x. https://gnomad.broadinstitute.org/gene/ENSG00000196433?dataset=gnomad_r4.
Do we want to add a prefilter function or just add a note?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, lets go with the faster way and a filter so we get the same numbers as the browser would get. For now lets just add a TODO and a ticket to do it
Co-authored-by: Qin He <44242118+KoalaQin@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
No description provided.