Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding step 4.2 variant filters #53

Open
bkinnersley opened this issue Mar 1, 2024 · 2 comments
Open

Understanding step 4.2 variant filters #53

bkinnersley opened this issue Mar 1, 2024 · 2 comments

Comments

@bkinnersley
Copy link

Hello,

Thank you so much for such a useful tool. I have been applying this to single-cell multiome (RNA-Seq plus ATAC-Seq) libraries, using the parameters as recommended here for the different library types. I am trying to understand the reason variants can be filtered out (as detailed in the sixth "FILTER" column of the output file), and if you could direct me to some documentation on that it would be greatly appreciated. As far as I can see the different filter categories are as follows:

  1. "" (empty string)
  2. "Noisy_site"
  3. "PoN"
  4. "Multi-allelic"
  5. "LC_Upstream"
  6. "LC_Downstream"
  7. "Clustered"
  8. "Cell_type_noise"
  9. "Min_cell_types"

While many are straight-forward to understand, others I am less aure about (particularly "Noisy_site", "LC_Upstream", "LC_Downstream", "Cell_type_noise") so any help with this would be greatly appreciated, thanks very much

Best wishes

Ben

@isidroc
Copy link
Contributor

isidroc commented Mar 13, 2024

Dear Ben, thanks for your question. The filters are described in the legend of Supplementary Figure 8 in our paper - sorry for not making that info more accessible in the repo:

BetaBin: the candidate mutation was not supported by a sufficient number of reads with the alternate allele to pass the Beta-binomial test;

Cell_type_noise: the number of reads supporting the alternate allele is only significant (Beta-binomial test) when applied to all cells across all cell types considered, but not when when applied to each cell type individually, or there are multiple alternate alleles, which suggests a noisy site;

Clustered: the candidate mutation was filtered because another candidate mutation maps within 5bp;

LC_Upstream: the candidate mutation was filtered because it mapped upstream of a low-complexity region;

LC_Dowstream: the candidate mutation was filtered because it mapped downstream of a low-complexity region; Multiple_cell_types: the variant was found in different cell types of the same sample;

No_reads: no reads supporting the alternative allele were found;

Noisy_site: the candidate mutation filtered because there are a significant number of reads supporting the alternate allele in a single cell type when running the Beta-binomial test for each cell type independently, but the site is also significant when applying the Beta-binomial test to all single cells across all cell types in a sample together;

PoN: variant filtered by the SComatic Panel of Normals (PoNs).

Hope this helps? Thanks

@MayaTalukdar
Copy link

Thank you so much for your tool! Following up on this question, what is the Min_cell_types filter? It is not described in your paper, I believe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants