Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions on the initial toolbox PR #5

Merged
merged 8 commits into from
Dec 18, 2024

Conversation

jkgoodrich
Copy link
Contributor

No description provided.

- Add a description of the repo structure to the README
- Add some potential requirements
- Update the documentation and make sure it works
- Create a notebook specific to loading gnomAD release data and just showing what each dataset looks like.
- Add a description of the repo structure to the README
- Add some potential requirements
- Update the documentation and make sure it works
- Create a notebook specific to loading gnomAD release data and just showing what each dataset looks like.
- Addition of variant.py to store some functions that can also be used by frequency based filtering
- Changes to frequency filtering to use new default settings if not passed a ht
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

@KoalaQin KoalaQin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sending these over for now because I need 1 of them for testing.

gnomad_toolbox/load_data.py Outdated Show resolved Hide resolved
gnomad_toolbox/load_data.py Outdated Show resolved Hide resolved
gnomad_toolbox/load_data.py Outdated Show resolved Hide resolved
gnomad_toolbox/load_data.py Outdated Show resolved Hide resolved
gnomad_toolbox/load_data.py Outdated Show resolved Hide resolved
gnomad_toolbox/analysis/general.py Outdated Show resolved Hide resolved
gnomad_toolbox/analysis/general.py Outdated Show resolved Hide resolved
Copy link

@KoalaQin KoalaQin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making this PR, your new structure of code makes more sense!
I think we can merge yours after some changes and I could work on the to-dos.

│ ├── intro_to_release_data.ipynb # Jupyter notebook introducing the loading of gnomAD release data.
```

# TODO: Add fully detailed info about how to install and open the notebooks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These steps should include:

  1. install miniconda;
  2. set up a conda env for a specific version of Hail (and update JAVA);
  3. pip install gnomad_toolbox;
  4. set up the service account;

I can write this part once your PR is merged into mine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, but don't add anything yet. First step is to get a bunch of tickets in for todo's and we can decide who will tackle what

gnomad_toolbox/analysis/general.py Show resolved Hide resolved
Comment on lines +74 to +76
hl.utils.warning(
f"No variant found at {variant.locus} with alleles {variant.alleles}"
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning is not working. Should we use import warnings instead?
Could be:

    warnings.warn(
        f"No variant found at {variant.locus} with alleles {variant.alleles}",
        RuntimeWarning,
    )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really? I got the warning

Copy link

@KoalaQin KoalaQin Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's working, but it was printed after the init cell. Screenshot 2024-12-18 at 1 17 25 PM
I didn't see it because I was far down the notebook.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just a hail/notebook thing.

# )
# )

# TODO: Consider this alternative approach to get the intervals from gencode. That

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your way is good, only a tiny problem, the gencode the Browser team used was preprocessed, I think they filtered out genes with the same ENSG that appear on both chrX and chrY (46 of them: 26 PAR, overlapping locus etc). We will get a different number of variants for those genes, e.x. https://gnomad.broadinstitute.org/gene/ENSG00000196433?dataset=gnomad_r4.
Do we want to add a prefilter function or just add a note?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, lets go with the faster way and a filter so we get the same numbers as the browser would get. For now lets just add a TODO and a ticket to do it

jkgoodrich and others added 2 commits December 18, 2024 09:01
Co-authored-by: Qin He <44242118+KoalaQin@users.noreply.github.com>
@jkgoodrich jkgoodrich requested a review from KoalaQin December 18, 2024 16:09
Copy link

@KoalaQin KoalaQin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jkgoodrich jkgoodrich merged commit d09953c into qh/draft_toolbox Dec 18, 2024
2 checks passed
@jkgoodrich jkgoodrich deleted the jg/draft_toolbox branch December 18, 2024 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants