Skip to content

Run Demo Arabidopsis 360 Population

Yen On Chan edited this page May 16, 2020 · 10 revisions

To run the Arabidopsis 360 Demo, users must download the raw data file, genotype data files, genotype map files, Haploview files, and GFF files from Cyverse. In order to do so, users must first create an account with Cyverse. Go to https://de.cyverse.org/de and click register to make a new account. Users who already have a Cyverse account can ignore this step.

iCommands software needs to be installed on the machine that is being used to download the data. To install iCommands and initialize iRODS connection, go to https://cyverse.atlassian.net/wiki/spaces/DS/pages/241869823/Setting+Up+iCommands and follow the instructions on the website.

After installing the iCommands software and initializing the iRODS connection, users must navigate to the reference files directory and clone the required files (the raw data file, genotyped data files, genotyped map files, Haploview files, and GFF files) from Cyverse. The genotype data, genotype map, Haploview, and GFF files are split by chromosome into five files (for the five chromosomes in the Arabidopsis genome). Arabidopsis data is provided in raw form. Raw phenotypic data was obtained from Angelovici et al. (2013). Please use the commands below to complete all the download processes.

cd /path/to/HAPPI_GWAS/reference_files
icd /iplant/home/angelovici_lab/HAPPI_GWAS/
iget -rfvTK /iplant/home/angelovici_lab/HAPPI_GWAS/gene_annotation_files ./
iget -rfvTK /iplant/home/angelovici_lab/HAPPI_GWAS/genotype_files ./
iget -rfvTK /iplant/home/angelovici_lab/HAPPI_GWAS/haploview_files ./
cd ../raw_data
iget -rfvTK /iplant/home/angelovici_lab/HAPPI_GWAS/raw_data/Arabidopsis_360 ./
cd ../HAPPI.GWAS

After successfully downloading all files, users are now ready to run the Arabidopsis 360 Demo data. To run the Arabidopsis 360 Demo data follow these steps:

Step 1: Edit the Arabidopsis360.yaml file:
----> a. Edit the “Raw Data” section. Ensure the path and file name are correct. In this tutorial, we will be using raw data (i.e. the “BLUP or BLUE” section is blank). The first column in the data file is the Line (i.e. Accession ID), the second is the population (i.e. replicate), and subsequent columns are the phenotypic data in raw form.
----> b. Edit the “GAPIT3” section. Ensure the path at line “GAPIT_genotype_file_path” is correct. We will be using genotype data in the numeric format. Files named Call_Method_75_GD[1-5].txt files will be used. Note how the MLM is the selected model as all other model options are ignored by the addition of #. SNP MAF is filtered at 0.05 with a significant FDR threshold of 0.05. An average LD decay of 5,000 bp if used; therefore, we chose a GAPIT_LD_number of 5000 (bp on each side of the significant SNP).
----> c. Edit the “Haploview” section. Ensure the path at line “Haploview_file_path” is correct. We will be using Chr[1-5].haploview.txt files.
----> d. Edit the “Match Gene Start and Stop” section. Ensure the path at line “GFF_file_path” is correct. We will be using Chr[1-5].txt files.
----> e. Edit the “Output Directory” section. Ensure the path on line “output” is correct.

** Please refer to the user manual for the example Arabidopsis360.yaml file.

Step 2: Run HAPPI GWAS using the following command:

Rscript HAPPI_GWAS.R -generateBLUP -GAPIT -extractHaplotype -searchGenes Arabidopsis360.yaml

Note how the generateBLUP option is used. Therefore, Line and population are fit as random effects in the model. The BLUP or BLUE section in the Arabidopsis360.yaml is blank.

Step 3: Access output data at the following:

cd < user-defined output path found in the Output Directory section of the YAML file >