-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with low proportion_repeat #102
Comments
this will have some performance consequences so we should evaluate.
Thanks for reporting. Using the lower --proportion-repeat does indeed uncover some problems. I can recreate and pushed a fix. We'll need to evaluate the change in performance. I also have another idea to try to fix which I'll post shortly. Note that this is an interesting read with this repeat structure (split on ACCTTT):
|
BTW, I'm interested to hear what @hdashnow thinks, but my experience is that lowering proportion_repeat is usually not fruitful. |
Hi all, thank you for looking into this! Perhaps I'm not fully understanding this parameter well. I'm interested in looking at milder expansions and felt that the default value of -p 0.8 was too high so was playing around with it. If I understand this correctly, with a 150 bp PE read, a value of 0.8 for a trinucleotide repeat expansion would only consider expansions past 40 repeats (150*0.8/3). |
@mhguo1 You are correct, STRling is tuned to discover larger expansions. Are you looking to discover new loci, or call known ones? If you are looking at a specific locus you can force STRling to genotype it even if it is small. |
I'm mostly hoping to call known ones, so I guess I will just add a custom bed file with known sites to -l to the call command. Thanks! |
Great! Let us know if you have additional questions, or see any other errors. It was useful to get this one fixed. |
Hello, I'm getting odd errors when using lower --proportion-repeat in strling extract. For example, when running this command:
strling extract -f GRCh38_full_analysis_set_plus_decoy_hla.fa -p 0.4 NA06989.final.cram test.bin
I get this error:
fatal.nim(49) sysFatal
Error: unhandled exception: genome_strs.nim(39, 12)
w.start < w.stop
repeat ACCTTT not found in expected region for (chrom: "chr2", start: 71388200, stop: 71388500, repeat: "ACCTTT"), TGAAGGCAGCTAAATTCTCTTACCCTGAGGCTAAGGGCAAGTAGTAGGTAACAAAGGAGTGTAAAGGAATTTATCTAGATAAGTTTATTTACTTTTGCCGACCTTTGATCATCCGACCTTTGATCATCCGACCTTTGATCATCTGACCTTTGATCATCTGACCTTTGATCATCCGACCTTTGATCATCTGACCTTTGATCATCCGCGTGCAGGACTGCTCCCTACAGGCGGGGGCAACAACTACCCACAGATTGTGTTGGCTCCAGGCCTTTGTCATTAAATCTGTACTAAATAAATACA, (chrom: "chr2", start: 71388500, stop: 71388500, repeat: "ACCTTT") [AssertionDefect]strling version: 0.5.1
fatal.nim(49) sysFatal
Error: unhandled exception: unpack.nim(59, 12)
fs != nil
[strling] got nil fileStream in unpack_file. check given file-path [AssertionDefect]I am using a 1000 Genomes sample downloaded from ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR323/ERR3239459/NA06989.final.cram
Thanks!
The text was updated successfully, but these errors were encountered: