Skip to content

Running SQANTI3 filter

Ángeles Arzalluz-Luque edited this page Apr 26, 2022 · 22 revisions

Filtering Isoforms using SQANTI3 output and a pre-defined rules

I've made a lightweight filtering script based on SQANTI3 output that filters for two things: (a) intra-priming and (b) short read junction support.

The script usage is:

usage: sqanti3_RulesFilter.py [-h] [--sam SAM] [--faa FAA] [-a INTRAPRIMING]
                              [-r RUNALENGTH] [-m MAX_DIST_TO_KNOWN_END]
                              [-c MIN_COV] [--filter_mono_exonic] [--skipGTF]
                              [--skipFaFq] [--skipJunction] [-v]
                              sqanti_class isoforms gtf_file

sqanti3_RulesFilter.py: error: the following arguments are required: sqanti_class, isoforms, gtf_file

python sqanti3_RulesFilter.py [classification] [fasta] [sam] [gtf]
         [-a INTRAPRIMING] [-c MIN_COV] [-m MAX_DIST_TO_KNOWN_END]

where -a determines the fraction of genomic 'A's above which the isoform will be filtered. The default is -a 0.6. -r is another option for looking at genomic 'A's that looks at the immediate run-A length. The default is -r 6.

-m sets the maximum distance to an annotated 3' end (the diff_to_gene_TTS field in classification output) to offset the intrapriming rule.

-c is the filter for the minimum short read junction support (looking at the min_cov field in _classification.txt), and can only be used if you have short read data.

For example:

python sqanti3_RulesFilter.py test_classification.txt \
                         test.renamed_corrected.fasta \
                         test.gtf

The current filtering rules are as follow:

  • If a transcript is FSM, then it is kept unless the 3' end is unreliable (intrapriming).
  • If a transcript is not FSM, then it is kept only if all of below are true:
    • (1) 3' end is reliable.
    • (2) does not have a junction that is labeled as RTSwitching.
    • (3) all junctions are either canonical or has short read coverage above -c threshold.