-
Notifications
You must be signed in to change notification settings - Fork 2
Data structure
Arnaud Ceol edited this page Jan 30, 2016
·
2 revisions
The data in HTS Flow is divided between input and output files. The input and output folders can be set in the configuration file (see installation).
- **contaminant_list.txt **: used for quality control (FastQC), after the trimming of the reads in primary analysis (fastqSorter2.py), for downsampling alignment files (random_line_extraction.using_ration.pl) and for calling the wellington tool in footprint analysis (runWellington.py)
- genomes: contains all the reference genomes used by HTS-flow (see below on how to install a new genome)
- ALN: alignment and index files produced by primary analysis (use a large amount of disk space).
- BW:
- COUNT: read counts per gene produced by primary analysis. These files are generated only for RNA-Seq samples.
- FastQC:
- QC: results from FastQC tool on alignment files stored in the ALN/ folder.
- preprocess: temporary data produced by primary analysis. Its content is removed at the end of each run.
- primary: results of primary analysis
- secondary: result of secondary analysis
- users: a folder is created for each user in this directory. The user’s directory contains the output of the analyses. For each analysis a folder is created, which name is composed of one letter (P: primary analysis, S: secondary analysis, M: merging) followed by the id of the analysis. It contains the scripts, output and log files of the analysis.