Skip to content

Data structure

Arnaud Ceol edited this page Jan 30, 2016 · 2 revisions

The data in HTS Flow is divided between input and output files. The input and output folders can be set in the configuration file (see installation).

Input

  • **contaminant_list.txt **: used for quality control (FastQC), after the trimming of the reads in primary analysis (fastqSorter2.py), for downsampling alignment files (random_line_extraction.using_ration.pl) and for calling the wellington tool in footprint analysis (runWellington.py)
  • genomes: contains all the reference genomes used by HTS-flow (see below on how to install a new genome)

Output

  • ALN: alignment and index files produced by primary analysis (use a large amount of disk space).
  • BW:
  • COUNT: read counts per gene produced by primary analysis. These files are generated only for RNA-Seq samples.
  • FastQC:
  • QC: results from FastQC tool on alignment files stored in the ALN/ folder.
  • preprocess: temporary data produced by primary analysis. Its content is removed at the end of each run.
  • primary: results of primary analysis
  • secondary: result of secondary analysis
  • users: a folder is created for each user in this directory. The user’s directory contains the output of the analyses. For each analysis a folder is created, which name is composed of one letter (P: primary analysis, S: secondary analysis, M: merging) followed by the id of the analysis. It contains the scripts, output and log files of the analysis.
Clone this wiki locally