-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault loading input SNP file with unexpected format #6
Comments
Hi, @yancylo. If it's possible to share it, please email your If you can't share it (I am not a scientist, but I assume there might be reasons you can't), please run $ file your_ld_file.ld
your_ld_file.ld: ASCII text, with CRLF line terminators What I'm looking for specifically is DOS/Windows line endings. It's possible that RELI needs to be updated to be more robust when reading files in this format. The workaround is to simply convert to Unix (linefeed-terminated) format with Other ways you can troubleshoot:
If any macOS doesn't have a
Please report back if this resolves / doesn't resolve your segfault error. |
Hi @ernstki thank you for your help. I tried
Are there any other specifications of the LD input file that I am overlooking? e.g. Do the lists of rsIDs need to be sorted/ organized in specific ways? Thanks |
@yancylo Can you please update the issue description with the exact command line you used (sanitizing any filenames if you need to)? The message My advice about using |
Hi @ernstki thanks for checking, here is the command I used
|
@yancylo Can you confirm with If you are in a completely Unix/macOS environment, and not analyzing files from collaborators, or files that made a pass through Microsoft Excel, there's usually little need. But it's always a first troubleshooting step when I find myself in situations like this. It would also help if you could |
Emailing or sharing the input files—if you are able—via Dropbox / Google Drive / OneDrive / whatever-you-use with tftoolsadmin -at- cchmc.org (that's me) would still be a tremendous help here. It completely takes out the guesswork. |
@ernstki Sorry I will need to get my manager's approval before sharing the data files with you, I will get back to you on that. Re: I did cat -A on both
Do the .snp and .ld files need to be sorted in a specific order for RELI to work? |
That's understandable. What you provided should be enough, though. I'm not aware of any requirement that the input files be sorted, but I believe the This is just solely based on an input file that I know works with RELI, which is Either way, you have found a bug in RELI (thanks for bringing it to our attention!), and it will still need to be updated to be more robust against malformed input. |
Here is how I converted your SNPs to the expected format:
Then give The last SNP (rs118107401) appears to have incorrect coordinates to me, but I'm not a bioinformatician, so you may know the reason for that. See also this script for fetching coordinates in the appropriate (BED4) format based on rs ID, if that's what you're starting from. (Note that it defaults to Using your cut -f5 multiple_sclerosis.snp | getsnpcoords > multiple_sclerosis.bed |
Thank you very much for your help! I converted my input list of SNPs to the standard |
Oh! Poor attention to detail on my part. Who knows where I came up with that extra "1"; thanks for sorting me out on that. Also, I'm glad we were able to work through it and get you rolling. Let's leave this issue open until I can get the code updated to not crash on SNP input like yours. GitHub will generate a few more automatic emails as I'm referencing it from commits and PRs, but you can safely ignore those. I don't have expansive knowledge of all the bioinformatics formats out there, so if you don't mind sharing, what analysis tool, web site, or database is generating the SNP list in that format? If it's a very common tool/database, we might want to look at just accepting SNP lists in that format directly. |
I am trying to run RELI on a list of 3,635 genome-wide significant variants which are aggregated into 78 LD blocks, but after RELI loaded in the snp table, it failed when loading the LD table. The error I got is "Segmentation fault":
I followed the format of SLE_EU.ld in the example/ folder but I am not sure if I am missing something. My .ld file contains 78 rows, each row starts with the top hit variant of the LD block, and is followed by ":" and a list of rs IDs in the same LD block with genome-wide significant pvalue. All rsIDs in the .ld file are listed in the .snp file (BED4 format).
I tried to run RELI without the .ld file (only run with .snp). It finished without any errors, but of course the results were not legitimate because LD was ignored - I got a Ratio of 1 (100% intersect) and p-value of 0.
Please advise on how to properly account for LD in the analysis. Thank you very much!
The text was updated successfully, but these errors were encountered: