Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mistake in choosing parallele contigs from embplant_pt-embplant_mt.fastg: mt vs pt #215

Open
365gemini opened this issue Nov 24, 2022 · 5 comments

Comments

@365gemini
Copy link

Hi Dr Jin,

I have assembled my data into a circle plastome already, but I found a segment in IR regions with extremely lower coverage than other regions when I mapped reads to the sequence.
微信图片_20221124212014

After checking the mt.fastg file, I found GetOrganelle choose #61416 with coverage of 9.46x rather than #61026 with coverage of 158.3x when it disentangled parallel contigs.
微信图片_20221124212853

I also found this situation in the same region of some other samples. Sometimes I can find two disentangling results from two samples of a same species, one displayed as mentioned above, the other one showed uniform coverage. Although I find the way to edit mt.fastg file manually, I have plenty of plastomes need to check.

I'm wondering why it is the case, and if there is a quick way to fix it. Could you kindly give any suggestions?

Thank you.

Best,
Zhi Yang
zhiyang@njfu.edu.cn

I attached the related file here
1477.zip

@Kinggerm
Copy link
Owner

Kinggerm commented Nov 24, 2022

Thanks for reaching out with such detailed information! A great form of an issue report.

Ideally, this should be avoided.

However, due to the contig 61416 (9.46x) having a better blast hit for two genes (partial ycf2 & ycf15), compared to the contig 61026 (158.3x) which has only partial ycf2, the current default algorithm preferred contig 61416. And if you use online blast, you may find that contig 61416 is aligned to the chloroplast genomes, indicating that it may be a recent pt transfer into mt without merging other mt loci. Generally, this can be a complex problem because there can be an opposite situation where blast hits are indeed more important than the depths in contig type classification. A possible future solution for GetOrganelle may be using an integrated likelihood framework to weigh the blast hits and depths together, rather than the multiple-step framework implemented currently. @wbyu Let's leave this issue open until solving without parameter finetuning.

For now, a simple solution would be using get_organelle_from_assembly.py to extract the embplant_pt from the post-slimmed fastg file with an increased --depth-factor value, which sets a smaller tolerance for depth difference between pt and mt. By default, it is 10. According to my test, values from 3 to 7 should work great for this sample. I would try using 3 for all your samples, e.g.,

get_organelle_from_assembly.py -g extended_K127.assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg -F embplant_pt --depth-factor 3.0 -o output_dir

Please let me know if it makes sense.

  • Extra tip1: as you installed GetOrganelle, you may use evaluate_assembly_using_mapping.py -f assembly_fasta_file -c yes -1 extended_1.fq -2 extended_2.fq -o output --draw to quickly and automatically get a list of statistics and the depth distribution plot similar to what you got with Geneious above. The plot and the generated statistics, especially the deviation in the coverage, may help identify whether there is an improvement, without manually checking every graph.

  • Extra tip2: you may notice that the contig name is not following its merging sequence in the graph, e.g. 61416 is next to 57676, not 62030; this was due to a minor bug introduced in previous versions. It will not affect the generated DNA sequence though. This bug was fixed in the latest GitHub version of GetOrganelle; you may upgrade so that the result may be more helpful for manual checking.

@Kinggerm Kinggerm changed the title Error in disentangling result from mt.fastg Error in disentangling result from embplant_pt-embplant_mt.fastg Nov 24, 2022
@Kinggerm
Copy link
Owner

Any updates?

@365gemini
Copy link
Author

Hi Dr Jin,

We have successfully extracted 19/24 true plastomes after we used the suggested script and adjusted the parameter of depth-factor at the range from 3 to 7. I attach the related files of both before and after adjustment of one failed sample here.

Thanks.
after.zip
before.zip

@Kinggerm
Copy link
Owner

For this failed sample, increasing the depth factor to as strict as 1.5 would work. I guess you can do similar parameter fine-tuning for the rest 5 samples. Please let me know the updates.

@365gemini
Copy link
Author

All samples were extracted successfully. Thank you so much!

@Kinggerm Kinggerm changed the title Error in disentangling result from embplant_pt-embplant_mt.fastg Mistake of GetOrganelle in choosing parallele contigs from embplant_pt-embplant_mt.fastg: mt vs pt Nov 30, 2022
@Kinggerm Kinggerm changed the title Mistake of GetOrganelle in choosing parallele contigs from embplant_pt-embplant_mt.fastg: mt vs pt Mistake in choosing parallele contigs from embplant_pt-embplant_mt.fastg: mt vs pt Nov 30, 2022
@Kinggerm Kinggerm reopened this Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants