Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instructions to set up an env to use notebooks #7

Merged
merged 39 commits into from
Jan 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
24b78ab
Add instructions to set up the toolbox and use notebooks
KoalaQin Dec 18, 2024
1b9f98d
small edit
KoalaQin Dec 18, 2024
8f49b09
Reorganize steps
KoalaQin Dec 19, 2024
1d811e6
small edit
KoalaQin Dec 19, 2024
18d0a85
small edit 2
KoalaQin Dec 19, 2024
a68b7fd
Specify Java version
KoalaQin Dec 19, 2024
7248a81
Undo weird changes to notebooks
KoalaQin Dec 19, 2024
7776c23
Formatting
KoalaQin Dec 19, 2024
fef0c8e
Merge remote-tracking branch 'origin/development' into qh/readme
KoalaQin Jan 10, 2025
f44a79f
Add a name for the gnomad_methods main branch so it can be used to se…
KoalaQin Jan 10, 2025
306daf7
Add jupyter server version limit to avoid jupyter notebook 403 error
KoalaQin Jan 15, 2025
f7312eb
Merge branch 'development' of https://github.com/broadinstitute/gnoma…
jkgoodrich Jan 16, 2025
aabf574
Add changes to the README.md and support for jupyter configs
jkgoodrich Jan 21, 2025
b9b5095
Add jupyter configs
jkgoodrich Jan 21, 2025
a1eece8
Move jupyter configs
jkgoodrich Jan 21, 2025
18b1a96
Modify the Jupyter config file to set the notebook directory.
jkgoodrich Jan 21, 2025
d9f1039
Add nbconfig
jkgoodrich Jan 21, 2025
9ea6b53
Use recursive glob
jkgoodrich Jan 22, 2025
c22aac0
I don't think the MANIFEST.in is needed
jkgoodrich Jan 22, 2025
6123429
Small changes in README.md
jkgoodrich Jan 22, 2025
25516a3
Make sure to install hail
jkgoodrich Jan 22, 2025
957c149
Add resources to README.md
jkgoodrich Jan 22, 2025
b7e6cfa
Change to use the Cloud Storage Connector
jkgoodrich Jan 22, 2025
f67bd31
Add image for README.md
jkgoodrich Jan 22, 2025
87ca520
Use correct name for the jupyter notebook -- run all cells
jkgoodrich Jan 22, 2025
8e1f73b
Formatting and clean-up of the README.md
jkgoodrich Jan 22, 2025
47d54fa
A bit more README.md clean-up
jkgoodrich Jan 22, 2025
efe069b
Update the Java prereq section
jkgoodrich Jan 22, 2025
f8fac36
Wrap lines for easier reading
jkgoodrich Jan 22, 2025
3527120
Apply suggestions from code review
jkgoodrich Jan 22, 2025
d662909
Add more infor to java install
jkgoodrich Jan 22, 2025
e68e579
Merge branch 'jg/readme_changes_and_add_configs' of https://github.co…
jkgoodrich Jan 22, 2025
fc766e3
Add Zulip to resources
jkgoodrich Jan 22, 2025
a749531
Add more info about running notebooks locally
jkgoodrich Jan 22, 2025
a4866dc
Small format addition
jkgoodrich Jan 22, 2025
6207d60
Apply suggestions from code review
jkgoodrich Jan 22, 2025
25f108b
Align comments in repo structure
jkgoodrich Jan 22, 2025
bc46c3b
Remove toc from config, we have toc2
jkgoodrich Jan 22, 2025
85e5128
Merge pull request #18 from broadinstitute/jg/readme_changes_and_add_…
jkgoodrich Jan 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
241 changes: 218 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,228 @@
# gnomad-toolbox
This repository provides a set of Python functions to simplify working with gnomAD Hail Tables. It includes tools for data access, filtering, and analysis.
# gnomad-toolbox: Simplifying Access and Analysis of gnomAD Data

![License](https://img.shields.io/github/license/broadinstitute/gnomad-toolbox)

The Genome Aggregation Database ([gnomAD](https://gnomad.broadinstitute.org/)) is a widely used resource for understanding genetic variation, offering large-scale data on millions of variants across diverse populations. This toolbox is a Python package designed to streamline use of gnomAD data, simplifying tasks like loading, filtering, and analysis, to make it more accessible to researchers.

> **Disclaimer:** This package is in its early stages of development, and we are actively working on improving it.
> There may be bugs, and the API is subject to change. Feedback and contributions are highly encouraged.

---

## Repository Structure

The package is organized as follows:

## Repository structure
```
ggnomad_toolbox/
gnomad_toolbox/
├── load_data.py # Functions to load gnomAD release Hail Tables.
├── filtering/
│ ├── __init__.py
│ ├── constraint.py # Functions to filter constraint metrics (e.g., observed/expected ratios).
│ ├── coverage.py # Functions to filter variants or regions based on coverage thresholds.
│ ├── frequency.py # Functions to filter variants by allele frequency thresholds.
│ ├── pext.py # Functions to filter variants using predicted expression (pext) scores.
| ├── variant.py # Functions to filter to a specific variant or set of variants.
│ ├── vep.py # Functions to filter variants based on VEP (Variant Effect Predictor) annotations.
├── filtering/ # Modules for filtering gnomAD data.
│ ├── constraint.py # Filter by constraint metrics (e.g., observed/expected ratios).
│ ├── coverage.py # Filter by coverage thresholds.
│ ├── frequency.py # Filter by allele frequency thresholds.
│ ├── pext.py # Filter by predicted expression (pext) scores.
│ ├── variant.py # Filter specific variants or sets of variants.
│ ├── vep.py # Filter by VEP (Variant Effect Predictor) annotations.
├── analysis/
│ ├── __init__.py
│ ├── general.py # General analysis functions, such as summarizing variant statistics.
├── analysis/ # Analysis functions.
│ ├── general.py # General-purpose analyses, such as summarizing variant statistics.
├── notebooks/
│ ├── intro_to_release_data.ipynb # Jupyter notebook introducing the loading of gnomAD release data.
├── notebooks/ # Example Jupyter notebooks.
│ ├── explore_release_data.ipynb # Guide to loading gnomAD release data.
│ ├── intro_to_filtering_variant_data.ipynb # Introduction to filtering gnomAD variants.
│ ├── dive_into_secondary_analyses.ipynb # Secondary analyses using gnomAD data.
```

# TODO: Add fully detailed info about how to install and open the notebooks.
## Getting started
### Install
pip install -r requirements.txt
---

## Set Up Your Environment for Hail and gnomAD Toolbox

This section provides step-by-step instructions to set up a working environment for using [Hail](https://hail.is/) and the gnomAD Toolbox.

> We provide this guide to help you set up your environment, but we cannot guarantee that it will work on all systems.
> If you encounter any issues, you can reach out to us on the [gnomAD Forum](https://discuss.gnomad.broadinstitute.org),
> and if it is something that we have come across before, we will try to help you out.

### Prerequisites

Before installing the toolbox, ensure the following:
- Administrator access to install software.
- A working internet connection.
- Java **11**.
- Check your Java version:
```commandline
java -version
```
- If you do not have Java 11 installed:
- For Linux, use `apt-get` or `yum` to install OpenJDK 11.
- For macOS, [Hail recommends](https://hail.is/docs/0.2/install/macosx.html) using [Homebrew](https://brew.sh/):
```commandline
brew tap homebrew/cask-versions
brew install --cask temurin8
```
or using a packaged installation from [Azul](https://www.azul.com/downloads/?version=java-11-lts&os=macos&package=jdk&show-old-builds=true).
> Ensure you choose a Java installation that matches your system architecture (found in **Apple Menu > About This Mac**).
> - For Apple M1/M2 chips, select an **arm64** Java package.
> - For Intel-based Macs, choose an **x86_64** Java package.
>
> You may also need to set the `JAVA_HOME` environment variable to the path of the installed Java version. For example:
> ```commandline
> export JAVA_HOME=/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/Home
> export PATH=$JAVA_HOME/bin:$PATH
> ```

### Install Miniconda

Miniconda is a lightweight distribution of Conda.

1. Download Miniconda from the [official website](https://docs.anaconda.com/miniconda/install/).
2. Follow the installation instructions described on the download page for your operating system.
3. Verify installation:
```commandline
conda --version
```

### Set Up a Conda Environment

Create and activate a new environment with a specified Python version for the gnomAD Toolbox:
```commandline
conda create -n gnomad-toolbox python=3.11
conda activate gnomad-toolbox
```

### Install gnomAD Toolbox
- To install from PyPI:
```commandline
pip install gnomad-toolbox
```
- To install the latest development version from GitHub:
```commandline
pip install git+https://github.com/broadinstitute/gnomad-toolbox@main
```

> **Troubleshooting:** If you encounter an error such as `Error: pg_config executable not found`, install the
> `postgresql` package:
> ```commandline
> conda install postgresql
> ```


### Verify the Installation

Start a Python shell and ensure that Hail and the gnomAD Toolbox are set up correctly:
```python
import hail as hl
import gnomad_toolbox
hl.init()
print("Hail and gnomad_toolbox setup is complete!")
```

---

## Available Example Notebooks

The gnomAD Toolbox includes Jupyter notebooks to help you get started with gnomAD data:

- **Explore Release Data:**
- Learn how to load and inspect gnomAD release data.
- Notebook: `explore_release_data.ipynb`

- **Filter Variants:**
- Understand how to filter variants using different criteria.
- Notebook: `intro_to_filtering_variant_data.ipynb`

- **Perform Secondary Analyses:**
- Explore more advanced analyses using gnomAD data.
- Notebook: `dive_into_secondary_analyses.ipynb`

---

## Run the Example Notebooks Locally
> If you already have experience with Google Cloud and using Jupyter notebooks, you can skip this section and use the
> notebooks in your preferred environment.

Hail can be [initialized](https://hail.is/docs/0.2/api.html#hail.init) with different backends depending on
where you want to run your analysis. For analyses that require a lot of computational resources, a cloud-based
environment will be most suitable.

However, running the gnomaAD Toolbox example notebooks can be done locally using the
`local` backend. At the beginning of each notebook, Hail is initialized with the `local` backend using:
```python
hl.init(backend="local")
```

To run the example notebooks locally, there are a few additional steps needed to set up your environment:

### Install the Cloud Storage Connector
The gnomAD Hail tables are stored in Google Cloud Storage, and in order to avoid downloading the entire dataset to your local machine,
we recommend using the [Google Cloud Storage Connector](https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage)
to access the data.

The easiest way to install the connector is to use the `install-gcs-connector` script provided by the Broad Institute:
```commandline
curl -sSL https://broad.io/install-gcs-connector | python3 - --auth-type UNAUTHENTICATED
```

### Copy and Open the Notebooks

1. Copy the notebooks to a directory of your choice:
```commandline
copy-gnomad-toolbox-notebooks /path/to/your/notebooks
```
> If the specified directory already exists, you will need to provide a different path, or if you want to overwrite
> the existing directory, you will need to add the `--overwrite` flag:
> ```commandline
> copy-gnomad-toolbox-notebooks /path/to/your/notebooks --overwrite
> ```

2. Start Jupyter with gnomad-toolbox specific configurations:
- For Jupyter Notebook:
```commandline
gnomad-toolbox-jupyter notebook
```
- For Jupyter Lab:
```commandline
gnomad-toolbox-jupyter lab
```

> These commands will start a Jupyter notebook/lab server and open a new tab in your default web browser. The
> notebook directory containing the example notebooks will be displayed.

3. Open the `explore_release_data.ipynb` notebook to learn how to load gnomAD release data:
- Run all cells by clicking on the >> button in the toolbar (shown in the image below) or by selecting "Run All"
- from the "Cell" menu.
![jupyter notebook -- run all cells](images/jupyter_run_all.png)

4. Explore the other notebooks described above.

5. Try adding your own queries to the notebooks to explore the data further.
> **WARNING:** Avoid running queries on the full dataset as it may take a long time.

---

## Resources

### gnomAD:
- [gnomAD Toolbox Documentation](https://broadinstitute.github.io/gnomad-toolbox/)
- [gnomAD Browser](https://gnomad.broadinstitute.org/)
- [gnomAD Download Page](https://gnomad.broadinstitute.org/downloads)
- [gnomAD Forum](https://discuss.gnomad.broadinstitute.org)

### Hail:
- [Hail Documentation](https://hail.is/docs/0.2/index.html)
- [Hail Discussion Forum](https://discuss.hail.is/)
- [Hail Zulip Chat](https://hail.zulipchat.com/)

---

## Contributing

We welcome contributions to the gnomAD Toolbox! See the [CONTRIBUTING.md](CONTRIBUTING.md) file for more information.

---

## License

### Opening the notebooks
jupyter lab
This project is licensed under the BSD 3-Clause License. See the [LICENSE](LICENSE) file for details.
7 changes: 7 additions & 0 deletions gnomad_toolbox/configs/jupyter_notebook_config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"NotebookApp": {
"nbserver_extensions": {
"jupyter_nbextensions_configurator": true
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"numberingH1": false,
"includeOutput": false,
"syncCollapseState": false
}
7 changes: 7 additions & 0 deletions gnomad_toolbox/configs/nbconfig/notebook.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"load_extensions": {
"nbextensions_configurator/config_menu/main": true,
"contrib_nbextensions_help_item/main": true,
"toc2/main": true
}
}
5 changes: 5 additions & 0 deletions gnomad_toolbox/configs/nbconfig/tree.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"load_extensions": {
"nbextensions_configurator/tree_tab/main": true
}
}
2 changes: 1 addition & 1 deletion gnomad_toolbox/notebooks/explore_release_data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14648,7 +14648,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
"version": "3.11.11"
},
"toc": {
"base_numbering": 1,
Expand Down
Loading
Loading