Pre-/post-processing pipelines and results for CUT&TAG data generated by the Neurogenomics Lab @ Imperial College London.
Links to all results. Each subheader is the unique ID of a given sequencing batch, assigned by the Imperial BRC Genomics Facility.
Native ChIP-seq (GSE66023) vs. ENCODE ChIP-seq (ENCSR000AKP-ENCFF038DDS)
- Using peak files from GEO supp files provided by original authors.
- Standardised peaks after reprocessing the data from fastq files by Leyla Abbasova.
- Individual peak files, Reference=ENCODE_H3K27ac
- Individual peak files, Reference=ENCODE_H3K27me3
- Consensus peak files, Reference=ENCODE_H3K27ac
- Consensus peak files (from C&T peak files), Reference=ENCODE_H3K27me3
- Pooled peak files (from C&T BAM files), Reference=ENCODE_H3K27ac
- Reference=ENCODE_H3K27ac: Comparison of CUT&Tag, CUT&Run and TIP-seq data generated by the Imperial Neurogenomics Lab vs. ENCODE.
Description: Initial test run of four samples (two H3k27ac + two H3k27ame3). Accidentally merged libraries across assay types (H3k27ac/H3k27ame3) during nf-core/atacseq run (will fix).
When BRC sends you an email letting you know they've finished sequencing your samples, follow these steps to download and prepare the data.
Note: File and folder names are just used as examples here. You'll need to adapt these to match your particular file/folder names.
- Log onto HPC.
- If you haven't done so already, set up your irods credentials (instructions here). You only need to do this once.
- Move into the folder where you want to store your data.
- Download the data with
irods
:
module load irods/4.2.0
iget -Pr /igfZone/home/di.hu/IGFQ001187_hu_10-5-2021_scCutandTag/fastq/2021-05-11/HL25VBBXY
cd HL25VBBXY
- Unpack each
.tar
file:
tar -xvf IGFQ001187_hu_10-5-2021_scCutandTag_4_16_2021-05-11.tar
tar -xvf IGFQ001187_hu_10-5-2021_scCutandTag_6_16_2021-05-11.tar
- Remove the old files (once you're sure the previous step worked):
rm IGFQ001187_hu_10-5-2021_scCutandTag_4_16_2021-05-11.tar
rm IGFQ001187_hu_10-5-2021_scCutandTag_6_16_2021-05-11.tar
- Optional: Change permissions recursively so that other members of your team can access and manipulate the files. Make sure to adapt the scope of the permissions however is appropriate for your case.
chmod -R u=rwx,go=rx ../HL25VBBXY/
-
Platform: nf-core (nextflow + singularity/docker)
-
Discussion on adapting this pipeline for CUT&RUN data.
-
Docker isn't allowed on HPC by itself because it presents some security risk. Instead, follow these instructions to create a R-based Docker container (Rocker) inside a singularity container.
-
By default singularity bind mounts](https://singularity.lbl.gov/quickstart)
/home/$USER
,/tmp
, and$PWD
into your container at runtime.
mkdir -p /rds/general/user/$USER/ephemeral/tmp/
mkdir -p /rds/general/user/bms20/ephemeral/rtmp/
-
On HPC, Rocker containers can be run through Singularity with a single command much like the native Docker commands, e.g. "singularity exec docker://rocker/tidyverse:latest R"
-
By default singularity bind mounts](https://singularity.lbl.gov/quickstart)
/home/$USER
,/tmp
, and$PWD
into your container at runtime. -
!IMPORTANT! You may need to change the path of "/rds/general//user/$USER/home/R/x86_64-redhat-linux-gnu-library/3.6/" to the actualy location of your R library.
-
Run Rocker within singularity
singularity exec -B /rds/general/user/$USER/ephemeral/tmp/:/tmp,/rds/general/user/$USER/ephemeral/tmp/:/var/tmp,/rds/general/user/$USER/ephemeral/rtmp/:/rds/general/user/$USER/home/R/x86_64-redhat-linux-gnu-library/3.6/ --writable-tmpfs docker://rocker/tidyverse:latest R
Now you can download the nfcore/atacseq singularity container via DockerHub
-
This will download "atacseq_latest.sif" to your home directory.
singularity pull docker://nfcore/atacseq:latest
-
Copy this .sif file to the cacheDir specified in your nextflow config file.
scp ~/atacseq_latest.sif /rds/general/user/$USER/projects/neurogenomics-lab/live/.singularity-cache/
-
Once you have the container downloaded, you can now specify it in the [
-profile
](download the singularity image outside of the pipeline and save in the same dir as the cacheDir path for the singularity option in the custom config file) flag in the main pipeline (see below). -
More info on this process is on the lab Wiki.
The config file tells nextflow how to run on Imperial's HPC.
module load nextflow
- Copy the config file to the expected location so HPC knows how to run nextflow properly:
scp hpc_config $HOME/.nextflow/config
-
Register with nextflow-tower according to Combiz's instructions
to get real-time reports as the pipeline runs. Once registered, add the token to your config file. -
Run the nextflow pipeline. See here for all parameter options.
-
In theory, nf-core/atacseq should download the singularity automatically when it runs.
However in practice, downloading it this way either takes waaayyy too long, and/or fails entirely. -
Therefore, per Narun Fancy's recommendation "download the singularity image outside of the pipeline and save in the same dir as the cacheDir path for the singularity option in the custom config file".
/rds/general/user/$USER/projects/neurogenomics-lab/live/.singularity-cache
-
For more info on the
-profile
flag, see here.
--input
: Path to design file.--genome
: Genome build your fasrq files are in.-profile
: Path to container profile.
nextflow run nf-core/atacseq --input raw_data/HK5M2BBXY/design.csv --genome GRCh37 -r 1.2.1 -profile /rds/general/user/$USER/projects/neurogenomics-lab/live/.singularity-cache/atacseq_latest.sif
- Platform: python
- Platform: workflowr (R + CLI)
- Platform: CLI-
Exercepts from the full BRC Genome help page
Illumina uses the following file name convention for the output fastq files
For example: samplename_S1_L001_R1_001.fastq.gz
- samplename : Name of the sample provided in the samplesheet
- S1 : Number of sample based on the sample order on the samplesheet
- L001 : Lane number of the flowcell
- R1 : The read. For e.g. R1 indicates Read 1 and R2 indicates Read 2 of a paired-end run
- 001 : Its always 001
- .fastq.gz : File extension. Its a gzipped fastq file
Please check the Illumina BCL2Fastq documentation for more information.