-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Document where to find pipelines to produce input files.
- Loading branch information
Showing
4 changed files
with
32 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,6 @@ | ||
data/* | ||
tmp/* | ||
envs/* | ||
logs/* | ||
.snakemake | ||
falcon-comb* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,36 +1,35 @@ | ||
Template workflow folder for Snakemake pipeline | ||
=============================================== | ||
Pogigwasc gene prediction of Loxodes magnus genome | ||
=================================================== | ||
|
||
After cloning this repository, you should change the name of the folder as | ||
appropriate, and update the remote URL of the repository to a new one for your | ||
project. | ||
Snakemake pipeline for gene prediction for Loxodes magnus, which has a genetic | ||
code with context-dependent stop codons. Introns are first empirically | ||
predicted with [Intronarrator](https://github.com/Swart-lab/Intronarrator) and | ||
artifically removed to produce an "intronless" assembly, to run | ||
[Pogigwasc](https://github.com/Swart-lab/pogigwasc) in `--no-intron` mode. This | ||
is because the short lengths and unusual length distribution of introns in | ||
Loxodes are difficult to model with the GHMM in Pogigwasc. | ||
|
||
Data | ||
---- | ||
|
||
Suggested setup | ||
--------------- | ||
Pipeline and scripts to generate the genome assembly are available from | ||
[loxodes-assembly-workflow](https://github.com/Swart-lab/loxodes-assembly-workflow) | ||
repository. Pipeline for the "intronless" assembly is available from | ||
[loxodes-intronarrator-workflow](https://github.com/Swart-lab/loxodes-intronarrator-workflow). | ||
|
||
```bash | ||
git clone git@github.com:Swart-lab/snakemake-template.git | ||
mv snakemake-template my-project # rename project folder | ||
cd my-project | ||
mkdir data # folder to put project data, gitignored | ||
mkdir envs # folder for Conda envs produced by workflow, gitignored | ||
mkdir tmp # folder for temp files, gitignored | ||
mkdir nb # folder for computational notebooks etc. | ||
git remote remove origin # remove template repo as a remote | ||
``` | ||
This current pipeline was used for annotation of the MAC and MIC genomes; path | ||
to reference assembly and names of output files were modified accordingly. | ||
|
||
Edit the files `run_snakemake.sh` and/or `run_snakemake_sge.sh` to include | ||
absolute paths to the working folder and to a Conda environment with Snakemake, | ||
and modify other settings (e.g. max number of CPUs) as required. | ||
Paths to input files in the `workflow/config.yaml` file are local paths used in | ||
the original data analysis. When re-running the pipeline, replace these with | ||
the actual paths on your system. | ||
|
||
Snakemake rules and config files are in the `workflow/` subfolder. | ||
Curated output from this annotation are included in the [archive of genome | ||
annotations](https://doi.org/10.17617/3.9QTROS). | ||
|
||
|
||
Running workflow | ||
---------------- | ||
|
||
To run on a local server, use `./run_snakemake.sh` script, and add rule names | ||
and additional parameters, e.g. `./run_snakemake.sh --dryrun`. | ||
|
||
[Documentation for `run_snakemake_sge.sh` TK] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,9 @@ | ||
falcon-comb_LmagMIC: | ||
ref_orig: | ||
ref_orig: # Original reference genome assembly | ||
/ebio/abt2_projects/ag-swart-loxodes/assembly/falcon-comb_LmagMIC/scaffolds.fasta | ||
ref_intronless_masked: | ||
ref_intronless_masked: # Intronless assembly produced by Intronarrator | ||
/ebio/abt2_projects/ag-swart-loxodes/annotation/falcon-comb_LmagMIC/intronarrator/falcon-comb_LmagMIC.0.2.minus_introns.ncRNA_hard_masked.fa | ||
realtrons_gff: | ||
realtrons_gff: # Intron annotation GFF3 file produced by intronarrator | ||
/ebio/abt2_projects/ag-swart-loxodes/annotation/falcon-comb_LmagMIC/intronarrator/all.realtrons.0.2.noalt.gff | ||
trf_min1000: | ||
trf_min1000: # Low-complexity sequence annotation GFF3 | ||
/ebio/abt2_projects/ag-swart-loxodes/annotation/falcon-comb_LmagMIC/trf/falcon-comb_LmagMIC.trf.no_overlap.min1000.merge.bed |