Skip to content

hms-dbmi/hail-workshop-2019

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

i2b2tranSMART Foundation Harvard Symposium 2019: Hail workshop material

Harvard Medical School - Department of Biomedical Informatics - Boston, MA

Workshop Structure

This workshop is divided in 3 modules:

Module 1. Introduction to Hail:

  • Deploying clusters or local installation
  • Basic concepts
  • Hail data structure and functions
  • Aggregations
  • Basic plotting

Module 2. GWAS in 5 steps

Module 3. Annotation, PCA and variant discovery

Sample code from each module can be found at the notebooks folder.

Hail 0.2

All the related documentation for Hail can be found at their website. Hail has two main data representations: Table and MatrixTable. This is how data is stored and it's important to learn all their features and characteristics. In addition, the Overview page is always a good starting point for new users.

Please note that the production release is version 0.2. There are a Forum and a Chat available for support and to interact with the Hail community.


Tools

Local installation in MAC using the command line

Execute the following commands in your terminal:

## Install Conda 
cd ~/Desktop # or your path of preference
git clone https://github.com/hms-dbmi/hail-workshop-2019.git
cd hail-workshop-2019
wget https://repo.anaconda.com/archive/Anaconda3-2019.03-MacOSX-x86_64.sh

# If you don't have wget install it by running "brew install wget" or manually download from: https://repo.anaconda.com/archive/Anaconda3-2019.03-MacOSX-x86_64.sh

# For a lighter installation you can use Miniconda instead: 
# https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

sh Anaconda3-2019.03-MacOSX-x86_64.sh

# Follow the instructions 
# For the question "Do you wish the installer to initialize Anaconda3 by running conda init?" We recommend "yes".

rm Anaconda3-2019.03-MacOSX-x86_64.sh

# Check if Java 1.8 JDK is installed. If it is not, go to: https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html > Java SE 8 JDK > Accept the license > Mac OS X x64 (.dmg)
java -version

# After java is installed source your profile
source ~/.bash_profile # For bash 
source ~/.zshrc # For Zsh

# Create Hail environment with Python 3.7 and Jupyter Lab
conda create --name hail python=3.7
conda activate hail
python -V # Make sure it's Python 3.7 - python 2 is not compatible!
python -m pip install hail
python -m pip install ipywidgets
python -m pip install jupyterlab

# Run Jupyter Lab
HOSTDIR=$(pwd)
mkdir -p $HOSTDIR/notebooks
cd $HOSTDIR/notebooks
jupyter lab # The path where this command is executed is automatically selected as the HOME directory
# jupyter lab should automatically run in your browser as: http://localhost:8888

Local installation using DOCKER

Make sure you have Docker installed. OTW Click HERE for installation instructions.

Once Docker is installed and running, execute the following commands in your terminal:

cd ~/Desktop # or your path of preference
git clone https://github.com/hms-dbmi/hail-workshop-2019.git
cd hail-workshop-2019/hail_docker

# Build image 
sudo docker build . --tag hail

# Run docker in the local host
HOSTDIR=$(pwd | sed 's/\/hail_docker$//g')
sudo docker run -p 127.0.0.1:8888:8888 -d -v $HOSTDIR/notebooks:/notebooks --name hail hail --mount

Once the container is running, open a new browser window and type localhost:8888, then Jupyter Lab should run. All the work you do will be saved in the notebooks folder.

Make sure you have disk space for the installation:

Repository Size
hail 2.42GB
miniconda3 457MB

Deploy clusters and install Hail in cloud services

The following tools will help you spining clusters in both Google and AWS:

Useful terminology

Cloud computing: he practice of using a network of remote servers hosted on the Internet to store, manage, and process data, rather than a local server or a personal computer

Cluster: collection of computers that work together to analyse data