-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request: presence/absence matrix #10
Comments
Hi Sion, thanks for using ProphAsm! How exactly do you envision this presence/absence matrix to look like? Would it be N x N matrix of files with simplitigs from shared k-mers across pairs of genomes? |
Hi Karel, thanks for the quick reply. I was thinking of something similar but with simpltigs x sample showing simplitig presence/absence in each sample. If this info is not calculated in prophasm it might be better achieved using external programs. I have used untig-caller to achieve this (which relies on suffix arrays (FM-index) provided by SeqAn3). Maybe a description in the README of this process would suffice? |
@karel-brinda any thoughts on adding this feature? something like unitig-caller would be great. |
@anwarMZ It was a long time ago that I worked on this so I will summarise what my (incomplete) notes say I did at the time. Please note that this was most likely insufficient for a complete analysis but might be useful to you to expand upon in the future. I used unitig-caller to convert the workflow to one I was familiar with:
I believe that the limitation here was that simplitigs are effectively concatenations of unitigs (a stochastic choice of a single path in a De Brujin graph constructed from k-mers) and as such might not actually exist in any samples in your collection (or might exist but be interrupted by another genetic element). Simple string matching may not be sufficient to identify the presence of a simplitig in a sample unless it includes partial matching as it might be comprised of two or more unitigs.@karel-brinda might be able to correct me or expand upon this. Hope that helped! |
Great tool! I have a quick feature request. An option to output a presence/absence matrix or similar output would be very useful.
On a similar note, do you know of any software that would achieve this in a scalable manner for application to large prokaryotic datasets?
The text was updated successfully, but these errors were encountered: