Higher coverage - lower contiguity? #1058
Replies: 1 comment
-
This is not something unexpected. Note that the average coverage is not quite connected with the contiguity of the assembly. What matters is the coverage of every point of the genome. So, depending on the protocol you might end with e.g. non-uniform coverage and coverage gaps with coverage spikes in between. Next, subsampling, if done improperly (e.g. via so-called "digital normalization"), could increase N50 at a price of elevated # of misassemblies (think about removing of one instance of repeat, then other one will be automatically resolved). On the other hand, subsampling might remove some high-covered chimeric connections, making life of assembler easier, etc. So, lots of things contribute to the assembly, the target landscape is multidimensional, so there is no single easy answer here :) |
Beta Was this translation helpful? Give feedback.
-
Description of bug
Hello,
I asked about this issue many people, but didn't get an answer. I'm working with two libraries for the same species, A from TruSeq Nano and mean read coverage ~25x and B from TruSeq PCR-free with mean coverage of ~120x. I filtered the reads for quality, removed contamintants and finally assembled with SPAdes. To my suprise, assembly A turned out more contiguous than B (N50 of 100 vs 40 kb), and subsampling of reads B improved N50 after re-assembly!
Is it expected behaviour for SPAdes? Why does it happen?
spades.log
params.txt
SPAdes version
SPAdes v3.13.0
Operating System
CentOS Linux 8.1
Python Version
No response
Method of SPAdes installation
conda
No errors reported in spades.log
Beta Was this translation helpful? Give feedback.
All reactions