GitHub - KarakulakTulay/gtex-tcga-data-download: Describes how to download RNA-Seq controlled data from GTEx via sra-toolkit and from TCGA via GDC Data Transfer Tool

Download GTEx and TCGA Controlled RNA-Seq Data

To access controlled data in GTEx and TCGA, one first should apply for data access. More information on how you can apply for data access can be found here.

The following steps assumes that you already have access to both databases and you want to download them into your server.

GTEx

GTEx data are stored on the database of Genotypes and Phenotypes (dbGaP).

Download SRA and BAM RNA-Seq controlled data via DBGaP

Steps

Login into your account (https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login).
Choose 'My Project' under 'Authorized Access Button'.
Click on 'run selector' next to the project you want to work on.
Check box 'DATASTORE filetype' under 'Filters List' --> Check boxes 'SRA' & 'BAM' under 'DATASTORE filetype'.
Check 'Assay Type' and click on 'rna-seq'.
Search for data (e.g Prostate, Skin, SRR id etc), Select items you want to download (you can select all files).
Download Metadata - Metadata includes SRR ids in the first column (which will be used to download data later).
Download repository key: Get it by clicking 'get dbGaP repository key' next to your project under 'My Projects' section.
Download sra-toolkit lastest version:

GitHub Page for sra-toolkit: https://github.com/ncbi/sra-tools
Download Page for sra-toolkit: https://github.com/ncbi/sra-tools/wiki/01.-Downloading-SRA-Toolkit
User's Guide: https://www.ncbi.nlm.nih.gov/sra/docs/sra-dbgap-download/

Command lines to download the data

9.1 Take SRR ids from Metadata file
less -S SraRunTable.txt | sed 's/,/ /g' | awk '{print $1}' | less > SRR_Acc_List.txt

9.2 Fetch the data from dbGaP
for srr in $(cat SRR_Acc_List.txt); do sratoolkit.2.11.2-ubuntu64/bin/prefetch --max-size 24G --ngc prj.ngc $srr; done >& out.txt & This command will download the sra files.

9.3 Convert SRA data into fastq files & compress them
for srr in $(cat SRR_Acc_List.txt); do cd $srr; sratoolkit.2.11.2-ubuntu64/bin/fasterq-dump *.sra -e 8 --ngc prj.ngc; gzip *fastq; cd ..; done >& out_conversion.txt &
This command will generate two fastq files if the run is paired-end. Otherwise one fastq file will be generated.

TCGA

TCGA data are stored on the Genomic Data Common Data Portal (GDC Data Portal). Documentation on how to download and upload data: https://docs.gdc.cancer.gov/Data_Transfer_Tool/Users_Guide/Preparing_for_Data_Download_and_Upload/

Steps

Login into your account: https://portal.gdc.cancer.gov/
Download token - Dropdown menu under your name when you login. Token file is necessary to be able to download controlled data on your server
Search for data you have access and you want to work on in the search box on the main page.
Choose 'RNA-Seq' under experimental strategy section
Go to 'Files' section on the left-hand side of the page and select 'bam' file under 'Data Format' section
Select 'Files' in the main page and add everything (or part of data) into the Cart
Click on the Cart (right-corner of the page)
Under 'Download' section, download the manifest file.
Download GDC Data Transfer Tool

Download Page for GDC-Data-Transfer-Tool: https://gdc.cancer.gov/access-data/gdc-data-transfer-tool
User's Guide: https://docs.gdc.cancer.gov/Data_Transfer_Tool/Users_Guide/Getting_Started/

Command Line to download bam Files
gdc-client download -m manifest_file -t tokenfile
token_file: see step 2
manifest_file: see step step 8

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Download GTEx and TCGA Controlled RNA-Seq Data

GTEx

Download SRA and BAM RNA-Seq controlled data via DBGaP

Steps

TCGA

Steps

About

Releases

Packages

KarakulakTulay/gtex-tcga-data-download

Folders and files

Latest commit

History

Repository files navigation

Download GTEx and TCGA Controlled RNA-Seq Data

GTEx

Download SRA and BAM RNA-Seq controlled data via DBGaP

Steps

TCGA

Steps

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages