number of Illumina reads for transcriptome assembly #850

glopatriello · 2021-10-06T08:25:50Z

glopatriello
Oct 6, 2021

Hi there!

I am performing a de novo transcriptome assembly of a non model organism. To do that, we have sequenced RNA with ONT and Illumina and we are going to build the transcriptome using the hybrid approach.

I was wondering if in the hybrid approach rnaSPAdes requires a certain number of sequenced Illumina reads(for istance 200 milion of fragments) or the more I sequence the better the results will be.

Best regards,
Giulia

Answered by andrewprzh

Oct 6, 2021

Dear Giulia

There is no certain limit for number of short reads you provide. However, low-covered datasets (let's say below 10-15 M reads may give sub-optimal results). Very high coverage datasets, on the other hand, may take a long time to assemble. From our experience 50-200M reads is a good balance for typical eukaryotic transcriptome.

Best
Andrey

View full answer

andrewprzh · 2021-10-06T10:57:48Z

andrewprzh
Oct 6, 2021
Maintainer

Dear Giulia

There is no certain limit for number of short reads you provide. However, low-covered datasets (let's say below 10-15 M reads may give sub-optimal results). Very high coverage datasets, on the other hand, may take a long time to assemble. From our experience 50-200M reads is a good balance for typical eukaryotic transcriptome.

Best
Andrey

0 replies

glopatriello · 2021-10-07T12:21:50Z

glopatriello
Oct 7, 2021
Author

Thank you Andrey for the fast answer!

I was wondering if we have sequenced more than 200 M of fragments ,for example 1B of fragments, if we use the entire datasets could introduce some noise into the transcriptome assembly?
In addition, to perform the sub-sampling of the reads do you suggest to perform just a sub-sampling in terms of number of input reads or it is better to use the reads normalization approach?

Best,
Giulia

0 replies

andrewprzh · 2021-10-07T12:49:35Z

andrewprzh
Oct 7, 2021
Maintainer

I cannot say for sure as we never assembled 1B reads. But yes, extremely high coverage datasets may possibly result in some excessive transcript sequences.

As for down-sampling - I personally used usual sub-sampling (random), but some users tried normalization approach as well. Not sure about the results, I've some controversial feedback on normalization procedures.
In any case, rnaSPAdes is designed to handle non-uniform coverage and usual sub-sampling should work just fine (and probably faster than normalization). Thus, I'd stick to the simple one.

Best
Andrey

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

number of Illumina reads for transcriptome assembly #850

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

number of Illumina reads for transcriptome assembly #850

glopatriello Oct 6, 2021

Replies: 3 comments

andrewprzh Oct 6, 2021 Maintainer

glopatriello Oct 7, 2021 Author

andrewprzh Oct 7, 2021 Maintainer

glopatriello
Oct 6, 2021

andrewprzh
Oct 6, 2021
Maintainer

glopatriello
Oct 7, 2021
Author

andrewprzh
Oct 7, 2021
Maintainer