-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ntHash with a Phred filter #107
Comments
Good idea, I like it. Is it something we'd like to implement directly in ntHash though? I think it can be a separate class (as a template, e.g., PhredFilter<NtHash>) that rolls ntHash and updates the Phred sum in each iteration. |
Yeah that would be my approach too. The new class checks the phred sum / avg and rolls the nthash iterator as necessary. Are you thinking of template because of seednthash? The other thing I'm not 100% on is if we should do rolling sum or keep an array of quality score and make sure no single quality score is below the threshold. |
Yeah basically I was thinking that since we have four ntHash classes, it'd be unusual to have a new feature implemented in only one of them. Although I'm not sure if Phread scores in spaced seeds make sense.
For that we can just keep a rolling minimum. |
Aren't they all the same nthash class under the hood with different constructor? We could just have four different constructor for phredntHash too
Hmm don't think that can work because what happens if you have multiple bases that violate your threshold. I think the best way is to skip k bases when you encounter low quality bases. |
They have the same methods, but there's no OO relationship between them. They're more like four separate classes, so we'd have to implement something like SeedPhreadNtHash and also two blind versions-BlindPhreadNtHash and BlindSeedPhreadNtHash :D I'm more or less on the template class side, it'll be much cleaner. We can chat about this more, your call eventually.
I was thinking of the range minimum query (RMQ) problem, but since we're sliding a window, we should be able to use a deque even in a case of multiple low-quality bases (e.g., this solution?) |
In that case, let's do template then.
This will need some discussion too since we need to decide which one to go with. I do worry that on a base lvl it might be too stringent. Of course we could implement both |
Enhancement:
I am thinking we could have a ntHash iterator with a phred filter. basically we could add a member which is a rolling sum of the Phred quality rather than the average. Every time we compute the k-mer hash, we check the quality score. If pass, we output, otherwise roll to the next hash.
The text was updated successfully, but these errors were encountered: