-
Notifications
You must be signed in to change notification settings - Fork 2
Genomes
Arnaud Ceol edited this page Mar 22, 2016
·
3 revisions
The location of the genomes is specified in HTS-flow configuration file, under the property HTSFLOW_GENOMES
An installation script will download the genomes from https://support.illumina.com/sequencing/sequencing_software/igenome.html, download or create an annotation library for R, and index the genomes.
The genomes to install are configured in conf/genomes.txt. This file is space separated and contains four columns:
- Species: name of the species, e.g. Homo sapiens
- host: UCSC is prefered, Ensembl is also accepted
- version: version of the assembly, e.g. hm19
- txdb_library: name of the R library that contains transcript annotations. This library will be either downloaded from bioconductor if available or created by the script. e.g. TxDb.Hsapiens.UCSC.hg19.knownGene
- annotation_library: R library for genome annotations, it should be available from Bioconductor. e.g. org.Hs.eg.db
- ucsc_table: name of the UCSC table that contains information about the genes. e.g. knownGene, refGene, sgdGene
Run the installation script:
export HTSFLOW_CONF=<path to theconfiguration file>
cd pipeline/
<R installation diretory>/bin/Rscript downloadGenomes.R ../conf/genomes.txt