Replies: 3 comments
-
Tagging @Dmitry-Antipov |
Beta Was this translation helpful? Give feedback.
-
I know this is an older question but seeing as how there are no responses... I'm interested in the resolution or some discussion on this as well. |
Beta Was this translation helpful? Give feedback.
-
Hi,
If you want only complete viral sequences - then metaviralSPAdes. If you do not want to miss anything - then metaSPAdes. Instead of rerunning metaSPAdes you can use before_rr.fasta from metaviralSPAdes run, but those sequences can be a bit less contiguous comparing to metaSPAdes' output. Now I think that the "best possible" would be combining metaSPAdes and metaviralSPAdes output, purging duplications, and then running some virus detection tool on the combined output but this is not the thing that is implemented directly. |
Beta Was this translation helpful? Give feedback.
-
Description of bug
Hi,
Perhaps I have misunderstood, it seems that the paper mentions metaviral-spades also outputs partial contigs from metaspades?
"To give users an option to examine both complete viral sequences (identified based on analyzing small subgraphs of the METASPADES assembly graphs) and partial viral sequences (corresponding to METASPADES contigs), the VIRALASSEMBLY output is combined with the regular METASPADES output."
"we compared VIRALASSEMBLY against METASPADES on 18 real datasets described in Supplementary Table S3. We analyzed only complete (i.e. circular contigs or linear contigs starting in sources and ending in sinks) and high-coverage (>5×) sequences for benchmarking (VIRALASSEMBLY and METASPADES report the same set of partial contigs)"
But another issue mentioned that metaviral-spades can only output "complete" viral sequences when analyzing a metagenomic dataset? I have also found that when using metaviral-spades, many datasets only yield single-digit scaffolds, and in 25 out of 100 metagenomic test data that I downloaded for testing, errors occurred due to the absence of complete sequences. In comparison to metaspades, the number of virus sequences assembled and high-quality virus sequences (evaluated by CheckV and genomad) using metaviral-spades is several times fewer.
In the above mentioned issue1106, the author suggested using metaspades to reassemble the metagenome reads because metaviral only outputs complete genomes. This has left me puzzled because it seems that complete genomes are very rare in metagenomes, and outputting only them may not meet the needs for metagenomic assembly. Therefore, I would like to inquire about the usual purposes and circumstances under which we would use metaviral-spades. Or in other words, when I have a metagenomic dataset and I want to assemble to obtain virus sequences for my analysis, how should I typically choose between metaspades and metaviral-spades?
Thanks!
spades.log
Just a question, not a bug
params.txt
spades.py --metaviral -1 test_1.fastq -2 test_2.fastq -t 7 -o spades
SPAdes version
v3.15.5
Operating System
CentOS 3.10.0-1160.15.2.el7.x86_64
Python Version
No response
Method of SPAdes installation
download the github release
No errors reported in spades.log
Beta Was this translation helpful? Give feedback.
All reactions