Skip to content

Genomes

Arnaud Ceol edited this page Mar 22, 2016 · 3 revisions

The location of the genomes is specified in HTS-flow configuration file, under the property HTSFLOW_GENOMES

An installation script will download the genomes from https://support.illumina.com/sequencing/sequencing_software/igenome.html, download or create an annotation library for R, and index the genomes.

The genomes to install are configured in conf/genomes.txt. This file is space separated and contains four columns:

  • Species: name of the species, e.g. Homo sapiens
  • host: UCSC is prefered, Ensembl is also accepted
  • version: version of the assembly, e.g. hm19
  • txdb_library: name of the R library that contains transcript annotations. This library will be either downloaded from bioconductor if available or created by the script. e.g. TxDb.Hsapiens.UCSC.hg19.knownGene
  • annotation_library: R library for genome annotations, it should be available from Bioconductor. e.g. org.Hs.eg.db
  • ucsc_table: name of the UCSC table that contains information about the genes. e.g. knownGene, refGene, sgdGene

Run the installation script:

export HTSFLOW_CONF=<path to theconfiguration file>

cd pipeline/

<R installation diretory>/bin/Rscript downloadGenomes.R ../conf/genomes.txt

Clone this wiki locally