This directory includes the workflow and files refer to scaffold. An analysis pipeline in snakemake
to streamline the processing for contig assembly scaffolding.
The pipeline contain following steps:
- Quality control for Hi-C reads by
fastp
- Identify the enzyme site position, merge and remove duplication by
juicer
HIC-Pro
andJABT
extract valid pairs- Scaffolding the contigs by
3d-DNA
#!/bin/bash
#SBATCH --exclusive
#SBATCH -p amd_256
#SBATCH -N 1
#SBATCH -n 64
snakemake -s scaffold_Snakefile --stats snake.stats --latency-wait 120 -k -j 32
Figure1. Pipeline of genome scaffold.
The accessions belonging to BIG or CER groups are directly guided using the Heinz 1706 assembly, while the remaining PIM groups are guided using LA2093 assembly using Ragtag
.
conda install -c bioconda ragtag
# scaffold a Heinz_1706 assembly
ragtag.py scaffold Heinz_1706.fasta query.fasta
# scaffold a Heinz_1706 assembly
ragtag.py scaffold LA2093.fasta query.fasta