We have developed an nf-HiChIP pipeline that combines the analytical approach for ChIP-seq data processing (mapping, filtering, peak calling, coverage tracks calculations) with HiChIP-specific analysis (MAPS pipeline, Juric, Ivan, et al.). This pipeline enables users to conduct thorough and efficient analysis of multiple HiChIP datasets simultaneously, eliminating the requirement for additional ChIP-seq experiments. This workflow is based on the reference implementation of the method designed by Zofia Tojek. The original version is available here.
- You can get familiar with Nextflow options.
-resume
flag allows you to execute the pipeline from the last successful step.- For more details, see Nextflow documentation.
Docker image available:
https://hub.docker.com/repository/docker/mateuszchilinski/hichip-nf-pipeline/general
Command to run Docker image (use -v to bind folder with data):
docker run -v /path_to_your_data/:/data_in_container/ -it mateuszchilinski/hichip-nf-pipeline:latest bash
Required Files for Reference Folder (Total 6 files) -
1. Reference fasta files -
> Homo_sapiens_assembly38.fasta
2. BWA Reference Index files -
> Homo_sapiens_assembly38.fasta.amb
> Homo_sapiens_assembly38.fasta.ann
> Homo_sapiens_assembly38.fasta.bwt
> Homo_sapiens_assembly38.fasta.pac
> Homo_sapiens_assembly38.fasta.sa
Example 1 for design.csv file
If you do not have raw and processed results (narrow peaks) from the ChIP-Seq experiment
sample | fastq_1 | fastq_2 | replicate | chipseq |
---|---|---|---|---|
S1 | /data/SAMPLE1_1_R1.fastq.gz | /data/SAMPLE1_1_R2.fastq.gz | 1 | None |
S1 | /data/SAMPLE1_2_R1.fastq.gz | /data/SAMPLE1_2_R2.fastq.gz | 2 | None |
S2 | /data/SAMPLE2_1_R1.fastq.gz | /data/SAMPLE2_1_R2.fastq.gz | 1 | None |
S2 | /data/SAMPLE2_2_R1.fastq.gz | /data/SAMPLE2_2_R2.fastq.gz | 2 | None |
Note -
- "None" (note the capital letter) in the last column.
- In this case, pseudo-ChIP-Seq data will be generated from HiChIP data.
Example 2 for design.csv file
If you have processed ChIP-Seq experiment results (in the form of narrow peaks)
sample | fastq_1 | fastq_2 | replicate | chipseq |
---|---|---|---|---|
S1 | /data/SAMPLE1_1_R1.fastq.gz | /data/SAMPLE1_1_R2.fastq.gz | 1 | /data/SAMPLE1.narrowPeak |
S1 | /data/SAMPLE1_2_R1.fastq.gz | /data/SAMPLE1_2_R2.fastq.gz | 2 | /data/SAMPLE1.narrowPeak |
S2 | /data/SAMPLE2_1_R1.fastq.gz | /data/SAMPLE2_1_R2.fastq.gz | 1 | /data/SAMPLE2.narrowPeak |
S2 | /data/SAMPLE2_2_R1.fastq.gz | /data/SAMPLE2_2_R2.fastq.gz | 2 | /data/SAMPLE2.narrowPeak |
Note -
- Remember, the pipeline requires chromosome names in the "chrX" format (e.g., chr1, chr14, chr21) in the narrowpeak file.
- Ensure peak files follow this naming convention and the BED6+4 format.
Example 3 for design.csv file
If you have raw ChIP-Seq data but the peaks have not been called yet
sample | fastq_1 | fastq_2 | input_1 | input_2 | replicate |
---|---|---|---|---|---|
S1 | /data/SAMPLE1_1_R1.fastq.gz | /data/SAMPLE1_1_R2.fastq.gz | /data/SAMPLE1_INPUT_R1.fastq.gz | /data/SAMPLE1_INPUT_R2.fastq.gz | 1 |
S1 | /data/SAMPLE1_2_R1.fastq.gz | /data/SAMPLE1_2_R2.fastq.gz | /data/SAMPLE1_INPUT_R1.fastq.gz | /data/SAMPLE1_INPUT_R2.fastq.gz | 2 |
S2 | /data/SAMPLE2_1_R1.fastq.gz | /data/SAMPLE2_1_R2.fastq.gz | /data/SAMPLE2_INPUT_R1.fastq.gz | /data/SAMPLE2_INPUT_R2.fastq.gz | 1 |
S2 | /data/SAMPLE2_2_R1.fastq.gz | /data/SAMPLE2_2_R2.fastq.gz | /data/SAMPLE2_INPUT_R1.fastq.gz | /data/SAMPLE2_INPUT_R2.fastq.gz | 2 |
To run for design file example 1 and example 2, use the main.nf with parameter (use the command inside the container):
/opt/nextflow run main.nf --design design.csv
To run for design file example 3: use the main_chipseq.nf with parameter (use the command inside the container):
/opt/nextflow run main_chipseq.nf --design design.csv
Example
/opt/nextflow run \
/mnt/sfglab/nf-hichip/nf-hichip/main.nf \
--ref /mnt/sfglab/Data/References/Genome/hg38/Homo_sapiens_assembly38/Homo_sapiens_assembly38.fasta \
--chrom_sizes /mnt/sfglab/Data/References/Genome/hg38/Homo_sapiens_assembly38/hg38.sizes \
--outdir /mnt/sfglab/workspaces/output/HiChIP_HG00731 \
--design /mnt/sfglab/workspaces/design/design_HiChIP_HG00731.csv \
--threads 4 \
--mem 10 \
--mapq 30 \
--peak_quality 0.01 \
The parameters of the pipeline can be found in the following table. All of them are optional:
Parameter | Description | Default |
---|---|---|
--ref | Reference genome for the analysis. | /workspaces/hichip-nf-pipeline/ref/Homo_sapiens_assembly38.fasta |
--outdir | Folder with the final results. | results |
--design | .csv file containing information about samples and replicates. | /workspaces/hichip-nf-pipeline/design_high.csv |
--chrom_sizes | Sizes of chromosomes for the specific reference genome. | /workspaces/hichip-nf-pipeline/hg38.chrom.sizes |
--threads | Threads are to be used in each task. | 4 |
--mem | Memory to use (in GB) for all samtools tasks (per-sample - e.g., 4 samples with 4 threads with 4GB would consume 64GB of memory). | 16 |
--mapq | MAPQ for MAPS. | 30 |
--peak_quality | Quality parameter (q-value (minimum FDR) cutoff) for MACS3. | 0.05 |
--genome_size | Genome size string for MACS3. | hs |
For Post-processing and figure recreation, please follow the scripts in the folder post_processing
If you use nf-HiChIP in your research (the idea, the algorithm, the analysis scripts, or the supplemental data), please give us a star on the GitHub repo page and cite our paper as follows:
Preprint bioRxiv:
Jodkowska, K., Parteka-Tojek, Z., Agarwal, A., Denkiewicz, M., Korsak, S., Chiliński, M., Banecki, K., & Plewczynski, D. (2024). Improved cohesin HiChIP protocol and bioinformatic analysis for robust detection of chromatin loops and stripes. In bioRxiv (p. 2024.05.16.594268). https://doi.org/10.1101/2024.05.16.594268