number of Illumina reads for transcriptome assembly #850
-
Hi there! I am performing a de novo transcriptome assembly of a non model organism. To do that, we have sequenced RNA with ONT and Illumina and we are going to build the transcriptome using the hybrid approach. I was wondering if in the hybrid approach rnaSPAdes requires a certain number of sequenced Illumina reads(for istance 200 milion of fragments) or the more I sequence the better the results will be. Best regards, |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Dear Giulia There is no certain limit for number of short reads you provide. However, low-covered datasets (let's say below 10-15 M reads may give sub-optimal results). Very high coverage datasets, on the other hand, may take a long time to assemble. From our experience 50-200M reads is a good balance for typical eukaryotic transcriptome. Best |
Beta Was this translation helpful? Give feedback.
-
Thank you Andrey for the fast answer! I was wondering if we have sequenced more than 200 M of fragments ,for example 1B of fragments, if we use the entire datasets could introduce some noise into the transcriptome assembly? Best, |
Beta Was this translation helpful? Give feedback.
-
I cannot say for sure as we never assembled 1B reads. But yes, extremely high coverage datasets may possibly result in some excessive transcript sequences. As for down-sampling - I personally used usual sub-sampling (random), but some users tried normalization approach as well. Not sure about the results, I've some controversial feedback on normalization procedures. Best |
Beta Was this translation helpful? Give feedback.
Dear Giulia
There is no certain limit for number of short reads you provide. However, low-covered datasets (let's say below 10-15 M reads may give sub-optimal results). Very high coverage datasets, on the other hand, may take a long time to assemble. From our experience 50-200M reads is a good balance for typical eukaryotic transcriptome.
Best
Andrey