Data structure

The data in HTS Flow is divided between input and output files. The input and output folders can be set in the configuration file (see installation).

Input

**contaminant_list.txt **: used for quality control (FastQC), after the trimming of the reads in primary analysis (fastqSorter2.py), for downsampling alignment files (random_line_extraction.using_ration.pl) and for calling the wellington tool in footprint analysis (runWellington.py)
genomes: contains all the reference genomes used by HTS-flow (see below on how to install a new genome)

ALN: alignment and index files produced by primary analysis (use a large amount of disk space).
BW:
COUNT: read counts per gene produced by primary analysis. These files are generated only for RNA-Seq samples.
FastQC:
QC: results from FastQC tool on alignment files stored in the ALN/ folder.
preprocess: temporary data produced by primary analysis. Its content is removed at the end of each run.
primary: results of primary analysis
secondary: result of secondary analysis
users: a folder is created for each user in this directory. The user’s directory contains the output of the analyses. For each analysis a folder is created, which name is composed of one letter (P: primary analysis, S: secondary analysis, M: merging) followed by the id of the analysis. It contains the scripts, output and log files of the analysis.