Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple variants with structural variations; 'Consensus made' keeps repeating #223

Open
SethMusker opened this issue Dec 20, 2022 · 2 comments

Comments

@SethMusker
Copy link

SethMusker commented Dec 20, 2022

Hi,

I've had generally good success assembling chloroplasts with get_organelle_from_assembly.py!

One of my samples fails though, with three repeats of the message Consensus made: (187292-|189390+) followed by Disentangling failed: 'Unable to generate result with single copy vertex percentage < 50%'.

Is this intended behaviour? I imagine that after the consensus is made, disentangling would be attempted again with the consensus sequence replacing the two previously merged edges, but this is not happening. These two edges have the same length and similar depth, so I understand they're a cause for concern in principle, but they're also only 155bp long and only differ by one base (C/T).

get_org.log.txt
slimmed_assembly_graph - Copy.gfa.txt
slimmed_assembly_graph - Copy.csv

If this is intended, I suppose the next step would be to manually remove one of the edges from the starting graph and repeat?

Cheers,
Seth

@Kinggerm
Copy link
Owner

Kinggerm commented Dec 20, 2022

Hi Seth,

Thanks for providing enough details about this complex situation.

  1. You are right these three trials are problematic. In principle, it was intended to be two different trials, resulting in two repeats of the same message. I just coincidentally corrected this problem in a recent update at a testing GetOrganelle branch. Anyway, you also own the credit for reporting it. BTW, the testing branch contains many major changes and is not ready to use.
  2. By looking at the graph, this sample seems to suffer from both mt-pt and multiple variants (heteroplasmy or contamination). Both issues can potentially violate the single-component (i.e. single variant) assumption of GetOrganelle. Here, the contigs of 10-ish depth should be the mt contigs. Because of the differentiation in coverage (10-ish v.s. 150-ish), GetOrganelle can differentiate the mt and pt easily. However, the two conceived variants in this sample have similar average coverages, resulting in two difficult problems.
    • SNPs usually result in simple parallel contigs like 187292 and 189390, upon which a consensus can be made if possible (GetOrganelle did it correctly for 187292 and 189390), or multiple different results containing different SNPs respectively can be generated, or a single result with the highest depth. Thus, the single-component assumption can persist.
    • Structural variations may yield a complex tangled graph, which will be difficult for simple algorithms to tell apart from a single-variant real-complex graph. Given the knowledge of what an IR-containing pt graph may look like, I can tell that 188784, 189490, 189444, 186851, and 188912 together form a structure composing two pt variants with IR boundary differences. This is the real problem that triggers the failure of disentanglement. We may leave this issue open until there is a neat solution.

In summary, you are right about manually removing a contig from the graph and then rerunning. However, the focal contigs are not 187292 and 189390 (you may remove one of them if you think the consensus is not a good idea though). Instead, the real solution is to remove either 189444 or 186851.

Best,
Jianjun

@Kinggerm Kinggerm changed the title 'Consensus made' keeps repeating Multiple variants with structural variations; 'Consensus made' keeps repeating Dec 20, 2022
@SethMusker
Copy link
Author

Hi Jianjun,

Immense thanks! Your reply was really informative and helpful. After removing 189444 the assembly finished nicely. For the record, the log is attached.

get_org.log.txt

All the best,
Seth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants