diff --git a/README.md b/README.md index de5f6e0c..8e1ead23 100644 --- a/README.md +++ b/README.md @@ -58,17 +58,18 @@ maginator ... --cluster qsub --cluster_info "-l nodes=1:ppn={cores}:thinnode,mem ## Test data -A test set can be found in the test_data directory. +A test set can be found in the maginator/test_data directory. 1. Download the 3 samples used for the test at SRA: https://www.ncbi.nlm.nih.gov/sra?LinkName=bioproject_sra_all&from_uid=715601 with the ID's dfc99c_A, f9d84e_A and 221641_A -2. Change the paths to the read-files in reads.csv -3. Unzip the contigs.fasta.gz -4. Run MAGinator +2. Clone repo: git clone https://github.com/Russel88/MAGinator.git +3. Change the paths to the read-files in reads.csv +4. Unzip the contigs.fasta.gz +5. Run MAGinator MAGinator has been run on the test data on a slurm server with the following command: -``` +```sh maginator --vamb_clusters clusters.tsv --reads reads.csv --contigs contigs.fasta --gtdb_db data/release207_v2/ --output test_out --cluster slurm --cluster_info "-n {cores} --mem {mem_gb}gb -t {runtime}" --max_mem 180 ``` -The expected output can be found in test_data/test_out (excluding the GTDB-tk folders, phylogeny alignments and BAM-files due to size limitations) +The expected output can be found as a zipped file on Zenodo: https://doi.org/10.5281/zenodo.8279036 ## Recommended workflow @@ -88,14 +89,23 @@ sed 's/@/_/g' vamb/clusters.tsv > clusters.tsv Now you are ready to run MAGinator. +## Functional Annotation + To generate the functional annotation of the genes we recommend using EggNOG mapper (https://github.com/eggnogdb/eggnog-mapper). You can download it and try to run it on the test data -``` +```sh mkdir test_out/functional_annotation emapper.py -i test/genes/all_genes_rep_seq.fasta --output test_out/functional_annotation -m diamond --cpu 38 ``` +The eggNOG output can be merged with clusters.tsv and further processed to obtain functional annotations of the MAG, cluster or sample levels with the following command: +```sh +(echo -e '#sample\tMAG_cluster\tMAG\tfunction'; join -1 1 -2 1 <(awk '{print $2 "\t" $1}' clusters.tsv | sort) <(tail -n +6 annotations.tsv | head -n -3 | cut -f1,15 | grep -v '\-$' | sed 's/_[[:digit:]]\+\t/\t/' | sed 's/,/\n/g' | perl -lane '{$q = $F[0] if $#F > 0; unshift(@F, $q) if $#F == 0}; print "$F[0]\t$F[1]"' | sed 's/\tko:/\t/' | sort) | awk '{print $2 "\t" $2 "\t" $3}' | sed 's/_/\t/' | sort -k1,1 -k2,2n) > MAGfunctions.tsv +``` +In this case the KEGG ortholog column 15 was picked from the eggNOG-mapper output. But by cutting e.g. column number 13, one would obtain GO terms instead. Refer to the header of the eggNOG-mapper output for other available functional annotations e.g. KEGG pathways, Pfam, CAZy, COGs, etc. + + ## MAGinator workflow This is what MAGinator does with your input (if you want to see all parameters run maginator --help): diff --git a/conda_build/meta.yaml b/conda_build/meta.yaml deleted file mode 100644 index 0ac8074f..00000000 --- a/conda_build/meta.yaml +++ /dev/null @@ -1,31 +0,0 @@ -{% set name = "maginator" %} -{% set version = "0.0.1" %} - -package: - name: "{{ name|lower }}" - version: "{{ version }}" - -source: - url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/{{ name }}-{{ version }}.tar.gz - sha256: - -requirements: - host: - - pip - - python >=3.5 - run: - - python >=3.5 - - snakemake - - mamba - -test: - commands: - - maginator -h - -about: - home: https://github.com/Russel88/MAGinator - license: MIT - summary: MAGinator - Accurate strain and functional profiling of MAGs - -build: - number: 0 diff --git a/maginator/recommended_workflow/envs/checkm.yaml b/maginator/recommended_workflow/envs/checkm.yaml new file mode 100644 index 00000000..7bcc493b --- /dev/null +++ b/maginator/recommended_workflow/envs/checkm.yaml @@ -0,0 +1,5 @@ +name: checkm-genome +channels: + - bioconda +dependencies: + - checkm-genome diff --git a/maginator/recommended_workflow/envs/import_hg_19.R b/maginator/recommended_workflow/envs/import_hg_19.R new file mode 100644 index 00000000..f1eba1c6 --- /dev/null +++ b/maginator/recommended_workflow/envs/import_hg_19.R @@ -0,0 +1,4 @@ +library(BSgenome.Hsapiens.UCSC.hg19.masked) +genome <- BSgenome.Hsapiens.UCSC.hg19 +out_file <- file.path(snakemake@output[["hg19"]]) +export(genome, out_file) diff --git a/maginator/recommended_workflow/envs/metabat2.yaml b/maginator/recommended_workflow/envs/metabat2.yaml new file mode 100644 index 00000000..cbf24cd8 --- /dev/null +++ b/maginator/recommended_workflow/envs/metabat2.yaml @@ -0,0 +1,5 @@ +name: metabat2 +channels: + - bioconda/label/cf201901 +dependencies: + - metabat2 diff --git a/maginator/recommended_workflow/envs/preprocess.yaml b/maginator/recommended_workflow/envs/preprocess.yaml new file mode 100644 index 00000000..34dedcac --- /dev/null +++ b/maginator/recommended_workflow/envs/preprocess.yaml @@ -0,0 +1,13 @@ +channels: + - bioconda + - conda-forge + - r +dependencies: + - biopython=1.79 + - pandas=1.4 + - bbmap=38.96 + - sickle-trim=1.33 + - spades=3.15.5 + - samtools=1.10 + - bwa-mem2=2.2.1 + - bioconductor-bsgenome.hsapiens.ucsc.hg19.masked=1.3.993 diff --git a/maginator/recommended_workflow/envs/samtools.yaml b/maginator/recommended_workflow/envs/samtools.yaml new file mode 100644 index 00000000..61dfbc38 --- /dev/null +++ b/maginator/recommended_workflow/envs/samtools.yaml @@ -0,0 +1,5 @@ +name: samtools +channels: + - bioconda +dependencies: + - samtools diff --git a/maginator/recommended_workflow/envs/vamb.yaml b/maginator/recommended_workflow/envs/vamb.yaml new file mode 100644 index 00000000..381c2c3f --- /dev/null +++ b/maginator/recommended_workflow/envs/vamb.yaml @@ -0,0 +1,14 @@ +name: vamb +channels: + - pytorch + - conda-forge + - bioconda +dependencies: + - pytorch + - pip + - torchvision + - cudatoolkit=10.2 + - pysam + - numpy=1.20 + - pip: + - git+https://github.com/RasmussenLab/vamb@v3.0.8 diff --git a/maginator/workflow/envs/phylo.yaml b/maginator/workflow/envs/phylo.yaml index 6912079b..30076ed0 100644 --- a/maginator/workflow/envs/phylo.yaml +++ b/maginator/workflow/envs/phylo.yaml @@ -1,6 +1,6 @@ channels: - - bioconda - conda-forge + - bioconda - biobuilds dependencies: - biopython=1.79 diff --git a/package.sh b/package.sh index 1a193a37..7a16d87a 100644 --- a/package.sh +++ b/package.sh @@ -1,4 +1,8 @@ +# New version +## 1) Update version in setup.py and commit and push +## 2) Pull request of dev into main +## 3) Make release on GitHub +## 4) Run this code: rm -r maginator.egg-info/ dist/ build/ python setup.py sdist -python setup.py install twine upload dist/* diff --git a/setup.py b/setup.py index 6e4bec9c..3ae07d0c 100644 --- a/setup.py +++ b/setup.py @@ -5,7 +5,7 @@ setuptools.setup( name="maginator", - version="0.1.17", + version="0.1.18", author="Jakob Russel & Trine Zachariasen", author_email="russel2620@gmail.com,trine_zachariasen@hotmail.com", description="MAGinator: Abundance, strain, and functional profiling of MAGs",