This repository provides a Nextflow pipeline for generating high-coverage regions from cram files, which can be valuable for downstream analyses.
The Accessibility Mask pipeline utilizes Python scripts and Nextflow to process cram files and identify high-coverage regions. By following a few simple steps, you can generate the accessibility mask for your genomic data.
To set up and run the pipeline, please follow these instructions:
Python Virtual Environment: Activate your Python virtual environment to ensure the required Python modules are installed correctly.
Install Dependencies: Install the following Python modules using the package manager of your choice:
- pandas
- pysam
- statistics
- numpy
Install Nextflow, Samtools, and Tabix: Ensure Nextflow, Samtools, and Tabix are installed on your system. You can find installation instructions for each tool in their respective documentation.
Nextflow Configuration: Place the provided
file in the folder where you intend to execute the pipeline. Modify thenextflow.config
file based on your specific requirements and settings. -
Load Dependencies: Load Nextflow, Samtools, and Tabix in your environment to make them accessible during pipeline execution.
Job Submission: Submit the job to the Compute Canada cluster using the following command:
sbatch --account="name of the account" --time=168:00:00 --mem=4G -J coverage --wrap="nextflow run /path/to/AccessibilityMask/" -o coverage.slurm.log
- Deactivate Virtual Environment: After job submission, remember to deactivate your Python virtual environment to return to the original setting.