Skip to content

Latest commit

 

History

History
993 lines (656 loc) · 45.2 KB

README.md

File metadata and controls

993 lines (656 loc) · 45.2 KB

Welcome to fingeRNAt's README

fingeRNAt is a software tool for detecting non-covalent interactions formed within complexes of nucleic acids with ligands.

Python 3.9 python Project Status: Active - The project has reached a stable, usable state and is being actively developed. Last modified License: GPL v3

CI (conda) Ubuntu install from apt Ubuntu install from apt and pip Check Markdown links Plugin Yaml Lint

Overview

fingeRNAt is a Python 3 software tool for detecting non-covalent interactions formed within complexes of nucleic acids with ligands.

Interactions are encoded and saved, i.e., in the form of bioinformatic-friendly Structural Interaction Fingerprint (SIFt) - a binary string, where the respective bit in the fingerprint is set to 1 in case of a presence of a particular interaction and to 0 otherwise. This enables high-throughput analysis of the interaction data using data analysis techniques.

Interactions can be calculated for the following complexes:

fingeRNAt runs under Python 3.5 - 3.9 on Linux and macOS.

Supplementary code and data regarding the manuscript can be found here.

What is the Structural Interaction Fingerprint (SIFt)?

Structural Interaction Fingerprint (SIFt) is a binary string describing the existence (1/0) of specified molecular interactions between all receptor's residues and the ligand (Deng et al., 2004).




SIFt translates information about 3D interactions in the receptor-ligand complex into a string, where the respective bit in the fingerprint is set to 1 in case of detecting particular interaction and to 0 otherwise.

Therefore, the interactions are represented in a unified fashion, thus allowing for easy high-throughput computational analysis, as they provide a full picture of all interactions within the complex.

Installation

Recommended fingeRNAt usage is in a conda environment.

Conda environment (the recommended method)

CI (conda)

Tested under Debian (11 stable), Ubuntu (18.04, 20.04, and 21.10), and macOS (10.15 and 11).

  1. Install conda

    Please refer to the conda manual and install the conda version with Python 3.x according to your operating system.

  2. Download fingeRNAt

    Clone the repository

    git clone --depth=1 https://github.com/n-szulc/fingernat.git

    Or

    Download the latest stable release from the releases page.

  3. Create conda environment

    conda env create -f fingeRNAt/env/fingeRNAt_env.yml

Using apt-get

Ubuntu install from apt

To install fingeRNAt at Debian and Debian-like systems using repository packages (tested under Debian 11 stable and Ubuntu 20.04):

# install packages
sudo apt-get update && sudo apt-get --no-install-recommends -y install openbabel python3.9-minimal python3-openbabel python3-pip python-is-python3 \
python3-pandas python3-numpy python3-rdkit python3-tqdm python3-yaml

# clone the fingeRNAt repository:
git clone --depth=1 https://github.com/n-szulc/fingernat.git

Using pip and apt-get

Ubuntu install from apt and pip

To install fingeRNAt at Debian and Debian-like systems using repository packages and pip-installed packages (tested under Debian 11 stable and Ubuntu 20.04):

# install a minimal python and openbabel tool box:
apt-get update && apt-get --no-install-recommends -y install openbabel python3.9-minimal python3-openbabel python3-pip python-is-python3

# install python packages:
pip install -r env/fingeRNAt_pip.txt

# clone the fingeRNAt repository:
git clone --depth=1 https://github.com/n-szulc/fingernat.git

Singularity image

Singularity image with the fineRNAt suite is available in the sylabs cloud: cloud.sylabs.io.

To fetch the latest image directly, run:

singularity pull library://filips/default/fingernat:latest

For usage examples of the image, see section below.

Manual installation

Required dependencies are:

  • python 3 (tested on versions 3.5, 3.6, 3.7, 3.8, 3.9)
  • openbabel 3.1.1
  • numpy
  • pandas
  • rdkit
  • pyyaml
  • tk
  • tqdm
  • sphinx

Usage

Quick start ⚡

To call fingeRNAt with example inputs:

conda activate fingernat

cd fingeRNAt

python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf

See the output file with SIFts in the outputs/ directory.

fingeRNAt in action

See the basic usage of the fingeRNAt

asciicast

Parameters description

fingeRNAt accepts the following parameters:

Parameter Description
-r path to RNA/DNA structure; see -> Inputs
[-l] path to ligands' file; see -> Inputs
[-f] optional Structural Interactions Fingerprint (SIFt) type; see -> SIFt types;
available types are: FULL [default],   SIMPLE,   PBS
[-addH] optional module name to be used to add hydrogens to ligands' structures; see -> Additional notes;
available modules are: OpenBabel [default],   RDKit,   None
[-wrapper] optional SIFt results' wrapper; see -> Wrappers;
available types are: ACUG,   PuPy,   Counter
[-o] optional path to save output
[-h2o] optional detection of water-mediated interactions; applies only to SIFt type FULL and its wrappers;
if not passed, all columns containing information about water-mediated interactions are empty (None)
[-dha] optional Donor-Hydrogen-Acceptor angle calculation when detecting hydrogen bonds;
see -> Hydrogen Bonds
[-custom] path to yaml file with information about additional interactions to be calculated;
see -> User-defined interactions
[-fingerDISt] fingerDISt Distance Metrics to be calculated (fingerDISt will be directly run on the SIFts output file);
see -> Usage examples
[-print] print detected interactions for each nucleic acid - ligand complex on screen
[-detail] generate an additional file with detailed data on detected interactions;
see -> PyMOL visualization
[-verbose] provides additional information about performed calculations at the given moment
[-debug] enters debug mode, see -> Debbuging mode
[-h] show help message

Inputs

  1. -r : path to receptor - RNA/DNA structure

    • supported file type: pdb
    • only 1 model of RNA/DNA structure
      • if there are more models, you have to choose only one (e.g., manually delete remaining models)
    • no ligands
    • may contain water & ions

    🔴 Hydrogens need to be added

  2. -l: optional path to ligand - small molecule, RNA/DNA/LNA, or protein

    • small molecule ligands
      • supported file types: sdf
      • possible multiple poses of ligands in one file
    • RNA/DNA structure
      • supported file type: sdf
      • possible multiple models of RNA/DNA structure
      • only RNA/DNA chains
        • no water, ions, ligands

    🔵 If -l is not specified, fingeRNAt will find all inorganic ions in the receptor file and treat them as ligands; see -> Interactions with inorganic ions.

Additional notes:

  • In the receptor molecule, charges on the phosphate groups do not need to be assigned (fingeRNAt always treats OP1 and OP2 atoms as negatively charged anions).
  • Receptor's residues with less than four atoms are not considered in calculations.
  • Input ligand molecules should have assigned desired protonation state and formal charges. Please pay attention to sdf ligand files converted from pdbqt/mol2 files if the formal charges are preserved in the sdf files.
  • All the missing ligands' hydrogens will be automatically added unless -addH None is passed.
  • Ligands with added hydrogens will be saved to a new sdf file (to the same directory as input) with _OB_addedH or _RDKit_addedH suffix, depending on the selected module for -addH; for -addH None no new sdf file will be saved, as it does not add hydrogens.

Structural Interaction Fingerprint (SIFt) types

fingeRNAt allows to calculate the following SIFt types:

  • FULL

    Calculates nine non-covalent interactions for each RNA/DNA residue - ligand pair:

    • hydrogen bondings (HB)
    • halogen bondings (HAL)
    • cation - anion (CA)
    • Pi - cation (Pi_Cation)
    • Pi - anion (Pi_anion)
    • Pi - stacking (Pi_Stacking) interactions
    • ion-mediated; distinguishes between:
      • Magnessium-mediated (Mg_mediated)
      • Potassium-mediated (K_mediated),
      • Sodium-mediated (Na_mediated),
      • Other ion-mediated (Other_mediated)
    • water-mediated (Water_mediated); only if -hoh parameter was passed; otherwise this interaction is assigned as None
    • lipophilic (lipophilic_mediated)

    🔶Returns twelve 0/1 values for each residue.

NOTE: It is possible to calculate more interactions specified by the user; see -> User-defined interactions

  • SIMPLE

    Calculates distances between each RNA/DNA residue and the ligand; returns 1 if the distance does not exceed the declared threshold (default = 4.0 Å), 0 otherwise. Does not take into account distances between hydrogens or hydrogen - heavy atom.

    🔶Returns one 0/1 value for each residue.

  • PBS

    Divides each RNA/DNA residue into three groups: Phosphate, Base, Sugar. Then, for each group separately, calculates the distance to the ligand; returns 1 if the distance does not exceed the declared threshold (default = 4.0 Å), 0 otherwise. Does not take into account distances between hydrogens or hydrogen - heavy atom.

    🔶Returns three 0/1 values for each residue.

    NOTE: Only for RNA/DNA with canonical residues.

Interactions with inorganic ions

It is possible to calculate contacts for nucleic acid - ions.

Ions must be in the same input pdb file, and parameter -l should be omitted. fingeRNAt will treat all inorganic ions as ligands and calculate SIFt for each residue - ion pair.

As the aforementioned interactions are detected based on contacts, only SIFt type SIMPLE or PBS may be calculated.

Usage example

python code/fingeRNAt.py -r example_inputs/3d2v.pdb -f PBS

Sample output

see -> 3d2v.pdb_IONS_SIMPLE.tsv

User-defined interactions

The user may define custom interactions to be detected. SMARTS patterns are used to define interacting atoms, and interaction definitions are encoded in a simple yaml file.

The interactions will be added as new columns to the standard SIFts outputs (also works with all the wrappers) or as new rows to -detail outputs and can be visualized using our PyMOL plugin.

Examples of various SMARTS patterns are available at daylight.com. To check if the defined SMARTS will hit desired atoms/groups, you may want to use our Jupyter Notebook.

Sample YAML file is provided with the fingeRNAt code. To validate your yaml file syntax, you can use an online validator.

NOTE: Additional interactions can be calculated only for the fingerprint type FULL.

Three types of interactions are currently supported:

Point-point interactions, distance only

Given:

  • 1 SMARTS for the receptor
  • 1 SMARTS for the ligand
  • minimum and maximum distance

Detects atoms fulfilling SMARTS conditions for the receptor and all ligands.

Checks if the distance between each pair of such atoms is within the provided distance range. If so, the interaction is detected.

Example:

NA-AA:
  Receptor_SMARTS:
    - '[!#1]'
  Ligand_SMARTS:
    - '[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]'
  Distance:
    min:                0.5
    max:                3.5

Point-point interactions, distance and angle

Variant 1

Given:

  • 1 SMARTS for the receptor
  • 2 SMARTS for the ligand
  • minimum and maximum distance
  • minimum and maximum angle

Detects atoms fulfilling SMARTS conditions for the receptor and all ligands.

Checks if the distance between atoms fulfilling receptor's SMARTS and ligand's first SMARTS is within the provided distance range. If so, the angle between the three atoms is calculated. If its value is within the provided angle range, the interaction is detected.

Example:

weak_hbond_Don-Lig__Acc-NA:
  Receptor_SMARTS:
    - '[!$([#1,#6,F,Cl,Br,I,o,s,nX3,#7v5,#15v5,#16v4,#16v6,*+1,*+2,*+3])]' # HBA
  Ligand_SMARTS:
    - '[#1;$([#1]-[C])]'   # hydrogen connected to C
    - '[#6!H0]'            # C in hydrogen bond donor
  Distance:
    # H···O
    min: 0.5
    max: 3.05
  Angle1:
    # PROTEINS: Structure, Function, and Bioinformatics 67:128–141 (2007)
    # C–H···O
    min: 90
    max: 180

Variant 2

Given:

  • 2 SMARTS for the receptor
  • 1 SMARTS for the ligand
  • minimum and maximum distance
  • minimum and maximum angle

Detects atoms fulfilling SMARTS conditions for the receptor and all ligands.

Checks if the distance between atoms fulfilling receptor's second SMARTS and ligand's SMARTS is within the provided distance range. If so, the angle between the three atoms is calculated. If its value is within the provided angle range, the interaction is detected.

Example:

weak_hbond_Don-NA__Acc-Lig:
  Receptor_SMARTS:
    - '[#6!H0]'              # C in hydrogen bond donor
    - '[#1;$([#1]-[C])]'    # hydrogen connected to C
  Ligand_SMARTS:
    - '[!$([#1,#6,F,Cl,Br,I,o,s,nX3,#7v5,#15v5,#16v4,#16v6,*+1,*+2,*+3])]' # HBA
  Distance:
    # H···O
    min: 0.5
    max: 3.05
  Angle1:
    # PROTEINS: Structure, Function, and Bioinformatics 67:128–141 (2007)
    # C–H···O
    min: 90
    max: 180

Point-point interactions, distance and two angles

Given:

  • 2 SMARTS for the receptor
  • 2 SMARTS for the ligand
  • minimum and maximum distance
  • minimum and maximum angle no 1
  • minimum and maximum angle no 2

Detects atoms fulfilling SMARTS conditions for the receptor and all ligands.

Checks if the distance between atoms fulfilling receptor's second SMARTS and first ligand's SMARTS is within the provided distance range. If so, the angle between the receptor's atom 1 - receptor's atom 2 - ligand's atom 1 is calculated. If its value is within the provided angle no 1 range, the angle between the receptor's atom 2 - ligand's atom 1 - ligand's atom 2 is calculated. If its value is within the provided angle no 2 range, the interaction is detected.

NOTE: If atoms that do not belong to the receptor (e.g., ions/water present in the structure) will be found for any of the receptor's SMARTS or atoms belonging to the receptor's residue with < 4 atoms, they will not be considered.

NOTE 2: If two SMARTS are given for the receptor/ligand, receptor's atom 1 and receptor's atom 2 (and/or ligand's atom 1 and ligand's atom 2) will be considered only if bound (this will be detected by the fingeRNAt).

Example:

multipolar_halogen_bond:
  Receptor_SMARTS:
    # carbonyl oxygen (non bodning)
    - '[$([OH0]=[CX3,c]);!$([OH0]=[CX3,c]-[OH,O-])]'
    # carbonyl carbon, forms the bond
    - '[$([CX3,c]=[OH0]);!$([CX3,c](=[OH0])-[OH,O-])]'
  Ligand_SMARTS:
    - '[F,Cl,Br,I]'    # halogen, forms the bond
    - '[#6]'          # any carbon atom connected to the halogen
  Distance:
    min: 0.5
    max: 3.65
  Angle1:
    min: 70     # receptor, teta2 - O=C⋯X
    max: 110    # receptor, teta2 - O=C⋯X
  Angle2:
    min: 90     # ligand, teta1 - C⋯X-#6
    max: 180    # ligand, teta1 - C⋯X-#6

User-defined thresholds

All the default thresholds can be changed in code/config.py.

Outputs

Outputs are saved to tsv files. Tsv is a simple text format similar to csv, except for the data being tab-separated instead of comma-separated.

If fingeRNAt was run without optional parameter -o, the script will create outputs/ directory in the working directory and save there the output in tsv format. Otherwise, fingeRNAt will save outputs in the user-specified location.

Example outputs for different SIFt types, their wrappers, and -detail are available from fingeRNAt/example_outputs.

FULL

Sample extract of output of running python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -h2o

See sample output -> 1aju_model1.pdb_ligands.sdf_FULL.tsv

NOTE: If fingeRNAt was called without -h2o parameter, all columns containing information about water-mediated interactions are empty (None) (applies also for wrappers; (see -> 'Parameters description')).

SIMPLE

Sample extract of output of running python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -f SIMPLE

See sample output -> 1aju_model1.pdb_ligands.sdf_SIMPLE.tsv

PBS

Sample extract of output of running python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -f PBS

See sample output -> 1aju_model1.pdb_ligands.sdf_PBS.tsv

Wrappers

Calculated SIFt (of any type) can be wrapped, which allows for representing it in lower resolution.

The results of the SIFt calculations and all passed wrappers are saved to separate tsv files. Multiple wrappers may be passed at once (comma-separated; see -> 'Usage examples').

Three types of wrappers are available.

ACUG

Wraps calculated results according to a nucleotide. Provides information if a particular kind of interaction between, e.g., any adenine from RNA/DNA and ligand occurred.

Sample extract of output of running python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -f PBS -wrapper ACUG

See sample output -> 1aju_model1.pdb_ligands.sdf_FULL_ACUG.tsv

PuPy

Wraps calculated results according to nucleobase type (purine or pyrimidine). Provides information if a particular kind of interaction between, e.g., any purine from RNA/DNA and ligand occurred.

Sample extract of output of running python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -wrapper PuPy

See sample output -> 1aju_model1.pdb_ligands.sdf_FULL_PuPy.tsv

NOTE: As -h2o parameter was not passed, the columns containing information about water-mediated interactions are empty (None) (see -> 'Parameters description').

Counter

Counts the total number of given interaction types.

Sample extract of output of running python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -wrapper Counter

See sample output -> 1aju_model1.pdb_ligands.sdf_FULL_Counter.tsv

NOTE: As -h2o parameter was not passed, the columns containing information about water-mediated interactions are empty (None) (see -> 'Parameters description')

Detail mode

fingeRNAt has additional mode -detail which saves all detected interactions, for any SIFt type, to separate tsv file (file with prefix DETAIL_).

The interactions saved in -detail mode follow the same convention - each row contains information about the interaction between given ligand-receptor, thus allowing for their easy, high-throughput processing.

The abovementioned data also serve as input to the dedicated PyMOL plugin, created by us, to visualize all interactions detected by the fingeRNAt (see -> PyMOL visualization).

NOTE: In the case of ion- and water-mediated interactions, two rows will correspond to one such interaction: (i) ligand - ion/water molecule, and (ii) ion/water molecule - RNA/DNA. Therefore in case (i) ion/water molecule is treated as the receptor and in case (ii) as the ligand.

NOTE 2: We refer to ligands as structures from ligands' file -l, as water molecules and inorganic ions are supposed to be provided in the RNA/DNA input pdb file.

The following data are saved in -detail mode:

  • Ligand_name
    • for ligands from sdf file: ligand's name
    • for inorganic ions: ion's name
    • for water molecules: HOH
  • Ligand_pose
    • for ligands from sdf file: ligand's pose number (indexed from 1)
    • for inorganic ions/water molecules: 0
  • Ligand_occurrence_in_sdf
    • if -l ligands' file was passed
      • for ligands from sdf file: ligand's occurrence from the beginning of sdf file
      • for inorganic ions/water molecules: ligand's occurrence (from the beginning of sdf file), for which they mediate the interaction with nucleic acid
    • if -l ligands' file was not passed: 0 (see -> Interactions with inorganic ions)
  • Interaction: interaction type
  • Ligand_Atom
    • for ligands from sdf file: ligand's atom index
    • for inorganic ions/water molecules: residue number : chain
  • Ligand_X/Ligand_Y/Ligand_Z: coordinates of ligand's/ion's/water molecule's atom
  • Receptor_Residue_Name/Receptor_Number/Receptor_Chain:
    • for interactions ligand/inorganic ions/water molecule - nucleic acid: receptor's nucleotide name/receptor's residue number/receptor's chain
    • for interactions ligand - ion: ion's name/ion's residue number/ion's chain
    • for interactions ligand - water molecule: HOH/water molecule's residue number/water molecule's chain
  • Receptor_Atom
    • for interactions ligand/inorganic ions/water molecule - nucleic acid: receptor's atom ID
    • for interactions ligand - ion: ion's name
    • for interactions ligand - water molecule: O
  • Receptor_X/Receptor_Y/Receptor_Z: coordinates of ligand's/ion's/water molecule's atom
  • Distance: distance between ligand's and residue's atoms [Å]

See sample output -> DETAIL_1aju_model1.pdb_ligands.sdf_FULL.tsv

PyMOL visualization

Detected interactions can be visualized using the dedicated PyMOL plugin, available in the plugin repository. To visualize interactions in this plugin, use the -detail outputs mode.

Usage examples

  • Calculate SIFt FULL, print detected interactions on screen, and save the output in the default location with the default filename.

python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -print

  • Calculate SIFt FULL with user-defined interactions and a table containing details on each detected interaction with the default filenames in the outputs directory.

python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -custom code/custom-interactions.yaml -detail

  • Calculate SIFt SIMPLE and save the output in the user-declared location with the default filename.

python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -f SIMPLE -o /path/to/my_output/

  • Calculate SIFt PBS, see what is being calculated using the verbose mode, and save the output with the default filename in the outputs/ directory.

python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -f PBS -verbose

  • Calculate default SIFt FULL and save its output along with a table containing details on each detected interaction with the default filenames in the outputs directory.

python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -detail

  • Calculate default SIFt FULL, consider water-mediated interactions, and save its output and three wrapped outputs with the default filenames in the outputs directory.

python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -h2o -wrapper ACUG,PuPy,Counter

python code/fingeRNAt.py -r example_inputs/3d2v.pdb -f PBS

  • Bonus: useful bash command to transpose resulting fingerprint file (rs program needed: sudo apt install rs for Debian-like systems):

cat fingerprint_FULL.tsv | sed -e "s/\t/,/g" | rs -c, -C, -T|sed -e 's/.$//' -e "s/,/\t/g" > fingerprint_FULL-T.tsv

Graphical User Interface

To use the Graphical User Interface (GUI), simply run

python code/gui.py

GUI is user-friendly and has all the aforementioned functionalities.

Debugging mode

fingeRNAt has a debugging mode in which it prints on screen exhaustive information about detected interactions.

The debugging mode may be used with each SIFt type and provides the following information:

  • FULL

    Prints the following properties of each ligand:

    1. atom indices of hydrogen bonds acceptors & donors
    2. atom indices of halogen bonds donors
    3. atom indices of cations & anions
    4. atom indices of ligand's aromatic rings
    5. IDs of inorganic ions in electrostatic contact with ligand_name_detail
    6. IDs of water molecules in contact with a ligand
    7. atom indices of lipophilic atoms
    8. atom indices for each user defined-interactions (single atoms if 1 SMARTS for ligand and tuples if 2 SMARTS for ligand were passed)

    Prints the following properties for each residue of nucleic acid:

    1. atom IDs of hydrogen bonds acceptors & donors
    2. atom IDs of anions
    3. atom IDs of aromatic rings
    4. atom IDs for each user defined-interactions (single atoms if 1 SMARTS for receptor and tuples if 2 SMARTS for receptor were passed)

    Prints detected interactions for pair of each nucleic acid residue - ligand:

    1. atoms creating hydrogen bond with their distance and angle (if parameter -dha was passed), separately for cases when the nucleic acid is hydrogen bond acceptor and hydrogen bond donor
    2. atoms creating halogen bond with their distance and angles
    3. atoms creating cation-anion interaction with their distance (note that nucleic acid's atoms are anions)
    4. atoms creating Pi-cation interaction with their distance and angle
    5. atoms creating Pi-anion interaction with their distance and angle (note that ligand's atoms are anions)
    6. atoms creating anion-Pi interaction with their distance and angle (note that nucleic acid's atoms are anions)
    7. atoms creating Pi-stacking interaction type Sandwich/Displaced with their distance, offset, and angle
    8. atoms creating Pi-stacking interaction type T-shaped with their distance, offset, and angle
    9. atoms creating ion-mediated interaction with their distances (nucleic acid - ion & ion - ligand)
    10. atoms creating water-mediated interaction with their distances (nucleic acid - water & water - ligand)
    11. atoms creating lipophilic interaction with their distance
    12. atoms creating each user-defined interaction with their distance and/or angle(s)
  • SIMPLE

    For each detected contact of nucleic acid residue-ligand prints information about atoms IDs and the distance between them.

  • PBS

    For each detected contact of nucleic acid residue's group - ligand prints information about atoms IDs and the distance between them.

NOTE: In all the above cases, only the first detected interaction of a given type is printed, as fingeRNAt stops searching for more once it detected one particular interaction.

Warnings/Errors

Please pay attention to the following types of errors: Could not sanitize molecule ending on line ....

This means that the RDKit library used by the fingeRNAt cannot properly read the molecule.

e.g.

[08:46:45] non-ring atom 39 marked aromatic
[08:46:45] ERROR: Could not sanitize molecule ending on line 168
[08:46:45] ERROR: non-ring atom 39 marked aromatic
  • Error: non-ring atom ... marked aromatic
    • Solution: please make sure that the mentioned molecule has a proper aromatic ring representation.
  • Error: Explicit valence for atom # ..., is greater than permitted (eg., Explicit valence for atom # 18 O, 3, is greater than permitted)
    • Solution: please make sure that the indicated atom(s) have a proper valence number, i.e., they form a correct number of bonds.

Frequently Asked Questions (FAQ)

What happens when I have a non-canonical nucleotide in my nucleic acid?

  • If you have a residue with only a non-canonical name (all atom names are canonical), e.g., X
FULL SIMPLE PBS
No wrapper OK OK OK
ACUG Omits interaction for residue with a non-canonical name Omits interaction for residue with a non-canonical name Omits interaction for residue with a non-canonical name
PuPy Omits interaction for residue with a non-canonical name Omits interaction for residue with a non-canonical name Omits interaction for residue with a non-canonical name
Counter OK OK OK
  • If you have a residue with a canonical name but with a non-canonical atom name, e.g., P9
FULL SIMPLE PBS
No wrapper OK OK Does not work
ACUG OK OK Does not work
PuPy OK OK Does not work
Counter OK OK Does not work

NOTE: We consider both oxygens from the phosphate group (OP1 and OP2) of nucleic acid as negatively charged, therefore fingeRNAt will not consider differently named atoms as anions.

  • If you have a residue with a non-canonical name and non-canonical atom name, e.g., P9
FULL SIMPLE PBS
No wrapper OK OK Does not work
ACUG Omits interaction for residue with a non-canonical name Omits interaction for residue with a non-canonical name Does not work
PuPy Omits interaction for residue with a non-canonical name Omits interaction for residue with a non-canonical name Does not work
Counter OK OK Does not work

NOTE: We consider both oxygens from the phosphate group (OP1 and OP2) of nucleic acid as negatively charged, therefore fingeRNAt will not consider differently named atoms as anions.


What happens when I have nucleic acid with two residues with the same number, e.g., due to errors in structure?

In the case of SIFt types SIMPLE and PBS, the only difference is that their outputs will have two columns with the same name in the output. SIFt and the wrapped results are correct.

However, in the case of SIFt type FULL, there will be two columns with the same name, but their Pi-interactions may be swapped, and their SIFt, as well as the wrapped results, may be unreliable.

fingerDISt 📏

fingerDISt is an additional, standalone tool that calculates different Distance Metrics based on Structural Interaction Fingerprint (SIFt) - outputs of fingeRNAt.

It calculates the selected Distance Metric for all SIFt vs. all SIFt from the input file - creates a matrix of scores and saves it to a tsv file.

Installation

fingerDISt, similarly like fingeRNAt, requires Python 3.5 - 3.9 and may be run from within the fingeRNAt's environment, but it is not obligatory. No external modules are needed.

Usage

Quick start ⚡

cd fingeRNAt

python code/fingerDISt.py -i example_outputs/1aju_model1.pdb_ligands.sdf_FULL.tsv -m tanimoto

Parameters description

fingerDISt accepts the following parameters:

-i                      path to tsv/csv file with calculated SIFt; see -> Inputs

-m                      types of desired Distance Metrics; see -> Distance Metrics

[-o]                  optional path to save the output

[-verbose]      prints calculated Distance Metrics on the screen

[-h]                  show help message

Inputs

  1. -i : path to tsv/csv file with calculated SIFs

fingeRNAt outputs are fingerDISt inputs

Distance Metrics

fingerDISt calculates the following Distance Metrics:

  • Tanimoto coefficient
  • Cosine similarity
  • Manhattan
  • Euclidean
  • Square Euclidean
  • Half Square Euclidean
  • Soergel
  • Tversky

Some Distance Metrics calculations were implemented based on the crux-fr-sprint code under the MIT license.

NOTE: Tanimoto coefficient works only for SIFt with binary values, therefore it may not work on input SIFt wrapped with Counter wrapper.

NOTE 2: It automatically replaces None with 0, meaning that Distance Metrics can be calculated for SIFt type FULL, which was called without -h2o parameter.

NOTE 3: ** The Tversky coefficient** is not symmetric. By default, in the resulting matrix, the reference molecules are in columns while compared molecules are in rows. Also it has a hard-coded α and β coefficients with widely used values of α=1 and β=0 (e.g., see: Leung et al. "SuCOS is Better than RMSD for Evaluating Fragment Elaboration and Docking Poses"). To modify this behavior or coefficient values, please modify the function tversky(self, p_vec, q_vec) in the code/DistanceMetrics.py module.

Outputs

fingerDISt saves scores for each selected Distance Metric to separate tsv files - a simple text format similar to csv, except for the data being tab-separated instead of comma-separated.

If fingerDISt was run without optional parameter -o, the script will create outputs/ directory in the working directory and save there the output in tsv format. Otherwise, fingerDISt will save outputs in the user-specified location.

Sample output of running python code/fingerDISt.py -i tests/expected_outputs/1aju_model1.pdb_ligands.sdf_FULL.tsv -m tanimoto

Usage examples

  • Calculate all available Distance Metrics on SIFts inputs type FULL and save the output with the default filename in the outputs directory.

python code/fingerDISt.py -i tests/expected_outputs/1aju_model1.pdb_ligands.sdf_FULL.tsv -m manhattan,square_euclidean,euclidean,half_square_euclidean,cosine_similarity,tanimoto,soergel,tversky

  • Calculate two Distance Metrics on SIFts inputs type PBS wrapped with ACUG wrapper and save the output to a user-specified location.

python code/fingerDISt.py -i tests/expected_outputs/1aju_model1.pdb_ligands.sdf_PBS_ACUG.tsv -m manhattan,square_euclidean -o my_dir

  • Calculate one Distance Metric on SIFts inputs type SIMPLE, print it on the screen, and save the output to the user-specified location.

python code/fingerDISt.py -i tests/expected_outputs/1aju_model1.pdb_ligands.sdf_SIMPLE.tsv -m tanimoto -verbose -o my_dir

  • Call fingerDISt directly from the fingeRNAt (will calculate the passed Distance Metrics on the calculated SIFts output (however not any wrapped one output) and save the result in the same default/given location).

python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -fingerDISt tanimoto,tversky

Singularity image

fingeRNAt is also provided as a singularity image. It contains two main command-line programs: fingeRNAt.py and fingerDISt.py and also (as a bonus) the OpenBabel tool-box.

# get information and the overwiew of the available commands:
./singularity-fingernat.img

# exec the fingernat:
singularity exec ./singularity-fingernat.img fingeRNAt.py

# perform some calculations:
singularity exec ./singularity-fingernat.img
fingeRNAt.py -r tests/1aju_model1.pdb -l tests/ligands.sdf -detail -verbose

# use openbabel to convert ligands:
singularity exec ./singularity-fingernat.img obabel tests/ligands.sdf -O tests/ligands.pdbqt -f 1 -l 1

Tested at the Interdisciplinary Centre for Mathematical and Computational Modelling UW and LUMI Supercomputer - thanks!

See the singularity image in action:

asciicast

Running fingeRNAt in parallel

One can easily parallelize fingeRNAt with GNU parallel, e.g., for parallel processing of multiple ligands/ligand sets:

# calculate fingerprints for all sdf ligands from firectory ligands
# and rna.pdb

find ligands/ -type f -name "*.sdf" | parallel --progress  "fingeRNAt.py -r rna.pdb -l {}"

# the same, but using a singularity image:
find ligands/ -type f -name "*.sdf" | parallel --progress  "singularity exec ./singularity-fingernat.img fingeRNAt.py -r rna.pdb -l {}"

See GNU Parallel for full documentation.

Documentation

To generate the fingeRNAt documentation file using sphinx:

cd docs
make html

The documentation will be available from _build/html.

Unit test

To run a unit test:

cd tests
python fingeRNAt_test.py

Implementation details

See: implementation details.

Contributors

:octocat: github contact
Natalia Szulc @n-szulc
Filip Stefaniak @filipsPL

Feedback, issues, and questions

We welcome any feedback, please send an email to Natalia Szulc or submit a bug report.

Discussion and questions may be asked on the discussion page.

Acknowledgments

Special thanks of gratitude to Masoud Farsani, Pritha Ghosh, and Tomasz Wirecki for their invaluable feedback, as well as to Prof. Janusz M. Bujnicki and the entire Bujnicki Lab for all the support and project guidelines.

Extensive script testing provided by Zuzanna Mackiewicz has been a great help in developing this tool.

Assistance provided by OpenBabel Community was greatly appreciated.

How to cite

If you use this software, please cite:

fingeRNAt - a novel tool for high-throughput analysis of nucleic acid-ligand interactions
Natalia A. Szulc, Zuzanna Mackiewicz, Janusz M. Bujnicki, Filip Stefaniak
PLOS Computational Biology doi: 10.1371/journal.pcbi.1009783

Supplementary code and data regarding the manuscript can be found here.

License

fingeRNAt is licensed under the GNU General Public License v3.0.