Skip to content

Commit

Permalink
Update Readme
Browse files Browse the repository at this point in the history
Document where to find pipelines to produce input files.
  • Loading branch information
kbseah committed Jan 2, 2023
1 parent 4cb6ed1 commit 28dde62
Show file tree
Hide file tree
Showing 4 changed files with 32 additions and 30 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
data/*
tmp/*
envs/*
logs/*
.snakemake
falcon-comb*
45 changes: 22 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,35 @@
Template workflow folder for Snakemake pipeline
===============================================
Pogigwasc gene prediction of Loxodes magnus genome
===================================================

After cloning this repository, you should change the name of the folder as
appropriate, and update the remote URL of the repository to a new one for your
project.
Snakemake pipeline for gene prediction for Loxodes magnus, which has a genetic
code with context-dependent stop codons. Introns are first empirically
predicted with [Intronarrator](https://github.com/Swart-lab/Intronarrator) and
artifically removed to produce an "intronless" assembly, to run
[Pogigwasc](https://github.com/Swart-lab/pogigwasc) in `--no-intron` mode. This
is because the short lengths and unusual length distribution of introns in
Loxodes are difficult to model with the GHMM in Pogigwasc.

Data
----

Suggested setup
---------------
Pipeline and scripts to generate the genome assembly are available from
[loxodes-assembly-workflow](https://github.com/Swart-lab/loxodes-assembly-workflow)
repository. Pipeline for the "intronless" assembly is available from
[loxodes-intronarrator-workflow](https://github.com/Swart-lab/loxodes-intronarrator-workflow).

```bash
git clone git@github.com:Swart-lab/snakemake-template.git
mv snakemake-template my-project # rename project folder
cd my-project
mkdir data # folder to put project data, gitignored
mkdir envs # folder for Conda envs produced by workflow, gitignored
mkdir tmp # folder for temp files, gitignored
mkdir nb # folder for computational notebooks etc.
git remote remove origin # remove template repo as a remote
```
This current pipeline was used for annotation of the MAC and MIC genomes; path
to reference assembly and names of output files were modified accordingly.

Edit the files `run_snakemake.sh` and/or `run_snakemake_sge.sh` to include
absolute paths to the working folder and to a Conda environment with Snakemake,
and modify other settings (e.g. max number of CPUs) as required.
Paths to input files in the `workflow/config.yaml` file are local paths used in
the original data analysis. When re-running the pipeline, replace these with
the actual paths on your system.

Snakemake rules and config files are in the `workflow/` subfolder.
Curated output from this annotation are included in the [archive of genome
annotations](https://doi.org/10.17617/3.9QTROS).


Running workflow
----------------

To run on a local server, use `./run_snakemake.sh` script, and add rule names
and additional parameters, e.g. `./run_snakemake.sh --dryrun`.

[Documentation for `run_snakemake_sge.sh` TK]
6 changes: 3 additions & 3 deletions run_snakemake.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@ set -e
# * Conda environments will be created in a subfolder `envs/`

# PATHS
SNAKEMAKE_ENV=
WD=
SNAKEMAKE_ENV=/ebio/ag-swart/home/kbseah/anaconda3/envs/snakemake
WD=/ebio/abt2_projects/ag-swart-loxodes/annotation/falcon-comb_LmagMIC/pogigwasc_intronless

# activate snakemake conda environment
source activate $SNAKEMAKE_ENV

snakemake \
--cores 24 \
--cores 16 \
--configfile $WD/workflow/config.yaml \
--use-conda \
--conda-frontend mamba \
Expand Down
8 changes: 4 additions & 4 deletions workflow/config.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
falcon-comb_LmagMIC:
ref_orig:
ref_orig: # Original reference genome assembly
/ebio/abt2_projects/ag-swart-loxodes/assembly/falcon-comb_LmagMIC/scaffolds.fasta
ref_intronless_masked:
ref_intronless_masked: # Intronless assembly produced by Intronarrator
/ebio/abt2_projects/ag-swart-loxodes/annotation/falcon-comb_LmagMIC/intronarrator/falcon-comb_LmagMIC.0.2.minus_introns.ncRNA_hard_masked.fa
realtrons_gff:
realtrons_gff: # Intron annotation GFF3 file produced by intronarrator
/ebio/abt2_projects/ag-swart-loxodes/annotation/falcon-comb_LmagMIC/intronarrator/all.realtrons.0.2.noalt.gff
trf_min1000:
trf_min1000: # Low-complexity sequence annotation GFF3
/ebio/abt2_projects/ag-swart-loxodes/annotation/falcon-comb_LmagMIC/trf/falcon-comb_LmagMIC.trf.no_overlap.min1000.merge.bed

0 comments on commit 28dde62

Please sign in to comment.