Skip to content

3. Tutorial (Advanced Settings)

Amir Mohseni edited this page Mar 4, 2024 · 35 revisions

Advanced settings allow you to realize the full potential of ALLEGRO.

  1. scorer or -s
  • The default scorer for ALLEGRO is 'dummy' which automatically assigns a score of 1.0 to each guide, essentially treating each guide as the same. You may edit this value to 'ucrispr', which uses the uCRISPR cleavage efficacy predictor developed by Zhang et. al. in:

    Zhang, Dong, et al. "Unified energetics analysis unravels SpCas9 cleavage activity for optimal gRNA design." Proceedings of the National Academy of Sciences 116.18 (2019): 8693-8698.

    Scoring guides may take a long time, which is why we recommend running ALLEGRO in a tmux or screen session in the background. ALLEGRO will cache the scored guides for you so that their scores do not have to be recalculated in a later experiment. Simply remove the 'data/cache/{species_name}.pickle' file to remove the cached scored guides for that species. Doing so prompts ALLEGRO to recalculate the scores for the guides in that file using uCRISPR.

  • Setting the scorer to 'ucrispr' requires a beta value, which we discuss next.

  1. beta or -b
  • Integer value. The final size of the gRNAs set must be fewer or equal to this. Think of it as your budget. Beta works best and is a required value (other than 0) when paired with scorer: 'ucrispr'. This is because changing the scorer from dummy changes the objective of ALLEGRO from minimizing the set size to maximizing it using the scored guides while keeping the size of the set bound to fewer or equal to beta.
  • If scorer is set to 'ucrispr', and the value of beta is set to 0 or a too small of a value, ALLEGRO will attempt to find the smallest possible beta for you within the allowed time (set by early_stopping_patience, which we discuss later, and given that enable_solver_diagnostics is enabled, which it is by default). In short, when you change your scorer from 'dummy' to 'ucrispr', either specify a beta yourself, or leave it as 0 so that ALLEGRO can find the smallest beta for you and output the scored guides.
  • The point of beta is to allow ALLEGRO to select guides with better cutting efficacy while sacrificing the set size. For example, a certain guide may score 99.0 as determined by uCRISPR, but target only a single gene, while another guide may score 45.0 and cut in 3 genes. If ALLEGRO is restricted by a small beta, it may include the second guide in its output, sacrificing the overall cutting efficacy. If a larger beta is given, ALLEGRO has more freedom to choose the 99.0 scoring guide, in addition to perhaps two more high-scoring guide, sacrificing the overall set size. This tradeoff is the essence of using beta and a guide scorer.
  1. patterns_to_exclude or -pte
  • List of strings. ALLEGRO will output guides that do not contain any of the IUPAC patterns in this list. Supports up to 5 chained IUPAC codes; e.g., 'RYSN'
  • Exception to the 5-rule above is when positional nucleotides are used in conjunction with 'N's. For example, entering 'NNNNNNNCNNNNGNNNN' will exclude guides with G and C in positions 4 and 9 distal to the PAM.
  • Supports individual nucleotides; e.g., 'TTTT' excludes guides with quad-Ts in their sequence (and consequently exclude sequences with more than 4 Ts).
  • Be careful not to place common nucleotides or IUPAC codes here such as just 'A' or 'AG' as you may end up excluding most or all guides from the calculation.
  • As another example, inputting 'WS' will exclude all guides with an A or a T followed by a G or C.
  1. output_offtargets or -off
  • Boolean True/(False) value. Setting to True directs ALLEGRO to use Bowtie and align the output library against background fasta files. Which files to align the library against is specified by input_species_offtarget_dir (-isod), which should contain the background fasta files to align against, and the input_species_offtarget_column (-isoc), which tells ALLEGRO which column in the CSV file provided by input_species_path(from Basic Settings) contains the names of the files to align against. For example, if yourinput_species_path: 'input_species.csv'` looks like the following:

    species_name filename offtarget_background
    test_fasta my_test_fasta.fna background.fna

    then input_species_offtarget_column should be set to 'offtarget_background'. You may also align the output library back to the input species fasta file ('my_test_fasta.fna') by specifying input_species_offtarget_column: 'filename' (thus not needing a third column at all).

  • Enabling this parameter will use seed_region_is_n_upstream_of_pam (-seed) and report_up_to_n_mismatches further down in the config.yaml. ALLEGRO will then output a file under your output experiment folder called 'targets.csv' containing guides from the output library that have the same exact seed region upstream of the PAM, but have report_up_to_n_mismatches mismatches in the seed-distal region of the target sequence.

  1. report_up_to_n_mismatches or -reportmm
  • Integer value in the range [0-3] inclusive and only used if output_offtargets: True. This is the '-v' parameter in Bowtie and cannot go over 3. The mismatches are considered only in the seed-distal region after the first seed_region_is_n_upstream_of_pam bases.
  1. preclustering or -prec
  • Boolean True/(False) value and affects running time performance.

  • Allows a guide within up to the set number of mismatches (after the seed region) of another guide to "inherit" the second guide's targets, essentially rendering the second guide useless and reducing the total guides needed.

  • Works best when unscored guides are present (scorer: 'dummy') as it does not consider scores.

  • Uses seed_region_is_n_upstream_of_pam and mismatches_allowed_after_seed_region parameters.

  • Consider the following simple example where in 'my_test_fasta.fna' we have:

    >gene1
    AAAAGTCTGTATAGAGAAGTTGG
    >gene2
    CAAAGTCTGTATAGAGAAGTTGG
    >gene3
    TAAAGTCTGTATAGAGAAGTTGG
    

    Where the only difference between each sequence is the left-most nucleotide. Using the basic settings of track: 'track_e' and multiplicity: 1, ALLEGRO will output 3 guides to cover each gene. By turning preclustering on, setting seed_region_is_n_upstream_of_pam: 12, and mismatches_allowed_after_seed_region: 1, ALLEGRO will output a single guide: AAAAGTCTGTATAGAGAAGT. We have pre-clustered the 3 guides into 1 as if this single guide targeted all 3 genes.

  1. postclustering or -postc
  • Boolean True/(False) value and affects running time performance. After a guide RNA library is generated as output, ALLEGRO will cluster guides in the output library using the other two parameters seed_region_is_n_upstream_of_pam, and mismatches_allowed_after_seed_region to add an additional column to the output CSV file called "cluster". Guides in the same cluster mismatch each other after the set seed region according to the set mismatches allowed parameter.
  • Post-clustering may not generate the same results as pre-clustering because some guides may not even be chosen to be in the final set before they are post-clustered.
  1. early_stopping_patience or -esp
  • Integer value, measured in seconds and defaults to 60. Only used in solving the ILP if there are remaining feasible guides with fractional values after solving the LP. ALLEGRO tells OR-Tools to stop searching for an optimal solution after this many seconds.
  • Increasing this value may, but does not guarantee, a smaller output set size.
  • If a feasible solution is not found within this time frame, ALLEGRO will automatically restart the search with a larger patience (given than enable_solver_diagnostics is enabled).
  1. enable_solver_diagnostics or -esd
  • Boolean (True)/False value. When a problem is deemed unsolvable (e.g., Status: MPSOLVER_INFEASIBLE), enabling diagnostics will attempt to relax each constraint and resolve the problem.
  • If the new problem with the relaxed constraint is solvable, ALLEGRO outputs the culprit gene/species.
  • Currently, to stop this process, you need to find the PID of the python process running ALLEGRO using: $ top and kill it manually: $ kill -SIGKILL PID #11

End of Tutorial

Do not hesitate to create a GitHub issue if you read through this documentation and could not find an answer to your question/issue. Click here to go back to the homepage.