fingeRNAt is a software tool for detecting non-covalent interactions formed within complexes of nucleic acids with ligands.
- Overview
- Installation
- Usage
- Quick start ⚡
- fingeRNAt in action
- Parameters description
- Inputs
- Structural Interaction Fingerprint (SIFt) types
- Interactions with inorganic ions
- User-defined interactions
- User-defined thresholds
- Outputs
- Wrappers
- Detail mode
- PyMOL visualization
- Usage examples
- Graphical User Interface
- Debugging mode
- Warnings/Errors
- Frequently Asked Questions (FAQ)
- fingerDISt 📏
- Singularity image
- Running fingeRNAt in parallel
- Documentation
- Unit test
- Implementation details
- Contributors
- Feedback, issues, and questions
- Acknowledgments
- How to cite
- License
fingeRNAt is a Python 3 software tool for detecting non-covalent interactions formed within complexes of nucleic acids with ligands.
Interactions are encoded and saved, i.e., in the form of bioinformatic-friendly Structural Interaction Fingerprint (SIFt) - a binary string, where the respective bit in the fingerprint is set to 1 in case of a presence of a particular interaction and to 0 otherwise. This enables high-throughput analysis of the interaction data using data analysis techniques.
Interactions can be calculated for the following complexes:
fingeRNAt runs under Python 3.5 - 3.9 on Linux and macOS.
Supplementary code and data regarding the manuscript can be found here.
Structural Interaction Fingerprint (SIFt) is a binary string describing the existence (1/0) of specified molecular interactions between all receptor's residues and the ligand (Deng et al., 2004).
SIFt translates information about 3D interactions in the receptor-ligand complex into a string, where the respective bit in the fingerprint is set to 1 in case of detecting particular interaction and to 0 otherwise.
Therefore, the interactions are represented in a unified fashion, thus allowing for easy high-throughput computational analysis, as they provide a full picture of all interactions within the complex.
Recommended fingeRNAt usage is in a conda environment.
Tested under Debian (11 stable), Ubuntu (18.04, 20.04, and 21.10), and macOS (10.15 and 11).
-
Install conda
Please refer to the conda manual and install the conda version with Python 3.x according to your operating system.
-
Download fingeRNAt
Clone the repository
git clone --depth=1 https://github.com/n-szulc/fingernat.git
Or
Download the latest stable release from the releases page.
-
Create conda environment
conda env create -f fingeRNAt/env/fingeRNAt_env.yml
To install fingeRNAt at Debian and Debian-like systems using repository packages (tested under Debian 11 stable and Ubuntu 20.04):
# install packages
sudo apt-get update && sudo apt-get --no-install-recommends -y install openbabel python3.9-minimal python3-openbabel python3-pip python-is-python3 \
python3-pandas python3-numpy python3-rdkit python3-tqdm python3-yaml
# clone the fingeRNAt repository:
git clone --depth=1 https://github.com/n-szulc/fingernat.git
To install fingeRNAt at Debian and Debian-like systems using repository packages and pip-installed packages (tested under Debian 11 stable and Ubuntu 20.04):
# install a minimal python and openbabel tool box:
apt-get update && apt-get --no-install-recommends -y install openbabel python3.9-minimal python3-openbabel python3-pip python-is-python3
# install python packages:
pip install -r env/fingeRNAt_pip.txt
# clone the fingeRNAt repository:
git clone --depth=1 https://github.com/n-szulc/fingernat.git
Singularity image with the fineRNAt suite is available in the sylabs cloud: cloud.sylabs.io.
To fetch the latest image directly, run:
singularity pull library://filips/default/fingernat:latest
For usage examples of the image, see section below.
Required dependencies are:
- python 3 (tested on versions 3.5, 3.6, 3.7, 3.8, 3.9)
- openbabel 3.1.1
- numpy
- pandas
- rdkit
- pyyaml
- tk
- tqdm
- sphinx
To call fingeRNAt with example inputs:
conda activate fingernat
cd fingeRNAt
python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf
See the output file with SIFts in the outputs/
directory.
See the basic usage of the fingeRNAt
fingeRNAt accepts the following parameters:
Parameter | Description |
---|---|
-r |
path to RNA/DNA structure; see -> Inputs |
[-l] |
path to ligands' file; see -> Inputs |
[-f] |
optional Structural Interactions Fingerprint (SIFt) type; see -> SIFt types; available types are: FULL [default], SIMPLE , PBS |
[-addH] |
optional module name to be used to add hydrogens to ligands' structures; see -> Additional notes; available modules are: OpenBabel [default], RDKit , None |
[-wrapper] |
optional SIFt results' wrapper; see -> Wrappers; available types are: ACUG , PuPy , Counter |
[-o] |
optional path to save output |
[-h2o] |
optional detection of water-mediated interactions; applies only to SIFt type FULL and its wrappers; if not passed, all columns containing information about water-mediated interactions are empty ( None ) |
[-dha] |
optional Donor-Hydrogen-Acceptor angle calculation when detecting hydrogen bonds; see -> Hydrogen Bonds |
[-custom] |
path to yaml file with information about additional interactions to be calculated; see -> User-defined interactions |
[-fingerDISt] |
fingerDISt Distance Metrics to be calculated (fingerDISt will be directly run on the SIFts output file); see -> Usage examples |
[-print] |
print detected interactions for each nucleic acid - ligand complex on screen |
[-detail] |
generate an additional file with detailed data on detected interactions; see -> PyMOL visualization |
[-verbose] |
provides additional information about performed calculations at the given moment |
[-debug] |
enters debug mode, see -> Debbuging mode |
[-h] |
show help message |
-
-r
: path to receptor - RNA/DNA structure- supported file type: pdb
- only 1 model of RNA/DNA structure
- if there are more models, you have to choose only one (e.g., manually delete remaining models)
- no ligands
- may contain water & ions
🔴 Hydrogens need to be added
-
-l
: optional path to ligand - small molecule, RNA/DNA/LNA, or protein- small molecule ligands
- supported file types: sdf
- possible multiple poses of ligands in one file
- RNA/DNA structure
- supported file type: sdf
- possible multiple models of RNA/DNA structure
- only RNA/DNA chains
- no water, ions, ligands
🔵 If
-l
is not specified, fingeRNAt will find all inorganic ions in the receptor file and treat them as ligands; see -> Interactions with inorganic ions. - small molecule ligands
Additional notes:
- In the receptor molecule, charges on the phosphate groups do not need to be assigned (fingeRNAt always treats OP1 and OP2 atoms as negatively charged anions).
- Receptor's residues with less than four atoms are not considered in calculations.
- Input ligand molecules should have assigned desired protonation state and formal charges. Please pay attention to sdf ligand files converted from pdbqt/mol2 files if the formal charges are preserved in the sdf files.
- All the missing ligands' hydrogens will be automatically added unless
-addH None
is passed. - Ligands with added hydrogens will be saved to a new sdf file (to the same directory as input) with
_OB_addedH
or_RDKit_addedH
suffix, depending on the selected module for-addH
; for-addH None
no new sdf file will be saved, as it does not add hydrogens.
fingeRNAt allows to calculate the following SIFt types:
-
FULL
Calculates nine non-covalent interactions for each RNA/DNA residue - ligand pair:
- hydrogen bondings (HB)
- halogen bondings (HAL)
- cation - anion (CA)
- Pi - cation (Pi_Cation)
- Pi - anion (Pi_anion)
- Pi - stacking (Pi_Stacking) interactions
- ion-mediated; distinguishes between:
- Magnessium-mediated (Mg_mediated)
- Potassium-mediated (K_mediated),
- Sodium-mediated (Na_mediated),
- Other ion-mediated (Other_mediated)
- water-mediated (Water_mediated); only if
-hoh
parameter was passed; otherwise this interaction is assigned asNone
- lipophilic (lipophilic_mediated)
🔶Returns twelve 0/1 values for each residue.
NOTE: It is possible to calculate more interactions specified by the user; see -> User-defined interactions
-
SIMPLE
Calculates distances between each RNA/DNA residue and the ligand; returns 1 if the distance does not exceed the declared threshold (default = 4.0 Å), 0 otherwise. Does not take into account distances between hydrogens or hydrogen - heavy atom.
🔶Returns one 0/1 value for each residue.
-
PBS
Divides each RNA/DNA residue into three groups: Phosphate, Base, Sugar. Then, for each group separately, calculates the distance to the ligand; returns 1 if the distance does not exceed the declared threshold (default = 4.0 Å), 0 otherwise. Does not take into account distances between hydrogens or hydrogen - heavy atom.
🔶Returns three 0/1 values for each residue.
NOTE: Only for RNA/DNA with canonical residues.
It is possible to calculate contacts for nucleic acid - ions.
Ions must be in the same input pdb file, and parameter -l
should be omitted. fingeRNAt will treat all inorganic ions as ligands and calculate SIFt for each residue - ion pair.
As the aforementioned interactions are detected based on contacts, only SIFt type SIMPLE or PBS may be calculated.
Usage example
python code/fingeRNAt.py -r example_inputs/3d2v.pdb -f PBS
Sample output
see -> 3d2v.pdb_IONS_SIMPLE.tsv
The user may define custom interactions to be detected. SMARTS patterns are used to define interacting atoms, and interaction definitions are encoded in a simple yaml file.
The interactions will be added as new columns to the standard SIFts outputs (also works with all the wrappers) or as new rows to -detail
outputs and can be visualized using our PyMOL plugin.
Examples of various SMARTS patterns are available at daylight.com. To check if the defined SMARTS will hit desired atoms/groups, you may want to use our Jupyter Notebook.
Sample YAML file is provided with the fingeRNAt code. To validate your yaml file syntax, you can use an online validator.
NOTE: Additional interactions can be calculated only for the fingerprint type
FULL
.
Three types of interactions are currently supported:
Given:
- 1 SMARTS for the receptor
- 1 SMARTS for the ligand
- minimum and maximum distance
Detects atoms fulfilling SMARTS conditions for the receptor and all ligands.
Checks if the distance between each pair of such atoms is within the provided distance range. If so, the interaction is detected.
Example:
NA-AA:
Receptor_SMARTS:
- '[!#1]'
Ligand_SMARTS:
- '[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]'
Distance:
min: 0.5
max: 3.5
Given:
- 1 SMARTS for the receptor
- 2 SMARTS for the ligand
- minimum and maximum distance
- minimum and maximum angle
Detects atoms fulfilling SMARTS conditions for the receptor and all ligands.
Checks if the distance between atoms fulfilling receptor's SMARTS and ligand's first SMARTS is within the provided distance range. If so, the angle between the three atoms is calculated. If its value is within the provided angle range, the interaction is detected.
Example:
weak_hbond_Don-Lig__Acc-NA:
Receptor_SMARTS:
- '[!$([#1,#6,F,Cl,Br,I,o,s,nX3,#7v5,#15v5,#16v4,#16v6,*+1,*+2,*+3])]' # HBA
Ligand_SMARTS:
- '[#1;$([#1]-[C])]' # hydrogen connected to C
- '[#6!H0]' # C in hydrogen bond donor
Distance:
# H···O
min: 0.5
max: 3.05
Angle1:
# PROTEINS: Structure, Function, and Bioinformatics 67:128–141 (2007)
# C–H···O
min: 90
max: 180
Given:
- 2 SMARTS for the receptor
- 1 SMARTS for the ligand
- minimum and maximum distance
- minimum and maximum angle
Detects atoms fulfilling SMARTS conditions for the receptor and all ligands.
Checks if the distance between atoms fulfilling receptor's second SMARTS and ligand's SMARTS is within the provided distance range. If so, the angle between the three atoms is calculated. If its value is within the provided angle range, the interaction is detected.
Example:
weak_hbond_Don-NA__Acc-Lig:
Receptor_SMARTS:
- '[#6!H0]' # C in hydrogen bond donor
- '[#1;$([#1]-[C])]' # hydrogen connected to C
Ligand_SMARTS:
- '[!$([#1,#6,F,Cl,Br,I,o,s,nX3,#7v5,#15v5,#16v4,#16v6,*+1,*+2,*+3])]' # HBA
Distance:
# H···O
min: 0.5
max: 3.05
Angle1:
# PROTEINS: Structure, Function, and Bioinformatics 67:128–141 (2007)
# C–H···O
min: 90
max: 180
Given:
- 2 SMARTS for the receptor
- 2 SMARTS for the ligand
- minimum and maximum distance
- minimum and maximum angle no 1
- minimum and maximum angle no 2
Detects atoms fulfilling SMARTS conditions for the receptor and all ligands.
Checks if the distance between atoms fulfilling receptor's second SMARTS and first ligand's SMARTS is within the provided distance range. If so, the angle between the receptor's atom 1 - receptor's atom 2 - ligand's atom 1 is calculated. If its value is within the provided angle no 1 range, the angle between the receptor's atom 2 - ligand's atom 1 - ligand's atom 2 is calculated. If its value is within the provided angle no 2 range, the interaction is detected.
NOTE: If atoms that do not belong to the receptor (e.g., ions/water present in the structure) will be found for any of the receptor's SMARTS or atoms belonging to the receptor's residue with < 4 atoms, they will not be considered.
NOTE 2: If two SMARTS are given for the receptor/ligand, receptor's atom 1 and receptor's atom 2 (and/or ligand's atom 1 and ligand's atom 2) will be considered only if bound (this will be detected by the fingeRNAt).
Example:
multipolar_halogen_bond:
Receptor_SMARTS:
# carbonyl oxygen (non bodning)
- '[$([OH0]=[CX3,c]);!$([OH0]=[CX3,c]-[OH,O-])]'
# carbonyl carbon, forms the bond
- '[$([CX3,c]=[OH0]);!$([CX3,c](=[OH0])-[OH,O-])]'
Ligand_SMARTS:
- '[F,Cl,Br,I]' # halogen, forms the bond
- '[#6]' # any carbon atom connected to the halogen
Distance:
min: 0.5
max: 3.65
Angle1:
min: 70 # receptor, teta2 - O=C⋯X
max: 110 # receptor, teta2 - O=C⋯X
Angle2:
min: 90 # ligand, teta1 - C⋯X-#6
max: 180 # ligand, teta1 - C⋯X-#6
All the default thresholds can be changed in code/config.py
.
Outputs are saved to tsv files. Tsv is a simple text format similar to csv, except for the data being tab-separated instead of comma-separated.
If fingeRNAt was run without optional parameter -o
, the script will create outputs/
directory in the working directory and save there the output in tsv format. Otherwise, fingeRNAt will save outputs in the user-specified location.
Example outputs for different SIFt types, their wrappers, and -detail
are available from fingeRNAt/example_outputs
.
Sample extract of output of running python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -h2o
See sample output -> 1aju_model1.pdb_ligands.sdf_FULL.tsv
NOTE: If fingeRNAt was called without
-h2o
parameter, all columns containing information about water-mediated interactions are empty (None
) (applies also for wrappers; (see -> 'Parameters description')).
Sample extract of output of running python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -f SIMPLE
See sample output -> 1aju_model1.pdb_ligands.sdf_SIMPLE.tsv
Sample extract of output of running python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -f PBS
See sample output -> 1aju_model1.pdb_ligands.sdf_PBS.tsv
Calculated SIFt (of any type) can be wrapped, which allows for representing it in lower resolution.
The results of the SIFt calculations and all passed wrappers are saved to separate tsv files. Multiple wrappers may be passed at once (comma-separated; see -> 'Usage examples').
Three types of wrappers are available.
Wraps calculated results according to a nucleotide. Provides information if a particular kind of interaction between, e.g., any adenine from RNA/DNA and ligand occurred.
Sample extract of output of running python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -f PBS -wrapper ACUG
See sample output -> 1aju_model1.pdb_ligands.sdf_FULL_ACUG.tsv
Wraps calculated results according to nucleobase type (purine or pyrimidine). Provides information if a particular kind of interaction between, e.g., any purine from RNA/DNA and ligand occurred.
Sample extract of output of running python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -wrapper PuPy
See sample output -> 1aju_model1.pdb_ligands.sdf_FULL_PuPy.tsv
NOTE: As
-h2o
parameter was not passed, the columns containing information about water-mediated interactions are empty (None
) (see -> 'Parameters description').
Counts the total number of given interaction types.
Sample extract of output of running python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -wrapper Counter
See sample output -> 1aju_model1.pdb_ligands.sdf_FULL_Counter.tsv
NOTE: As -h2o parameter was not passed, the columns containing information about water-mediated interactions are empty (
None
) (see -> 'Parameters description')
fingeRNAt has additional mode -detail
which saves all detected interactions, for any SIFt type, to separate tsv file (file with prefix DETAIL_
).
The interactions saved in -detail
mode follow the same convention - each row contains information about the interaction between given ligand-receptor, thus allowing for their easy, high-throughput processing.
The abovementioned data also serve as input to the dedicated PyMOL plugin, created by us, to visualize all interactions detected by the fingeRNAt (see -> PyMOL visualization).
NOTE: In the case of ion- and water-mediated interactions, two rows will correspond to one such interaction: (i) ligand - ion/water molecule, and (ii) ion/water molecule - RNA/DNA. Therefore in case (i) ion/water molecule is treated as the receptor and in case (ii) as the ligand.
NOTE 2: We refer to ligands as structures from ligands' file
-l
, as water molecules and inorganic ions are supposed to be provided in the RNA/DNA input pdb file.
The following data are saved in -detail
mode:
- Ligand_name
- for ligands from sdf file: ligand's name
- for inorganic ions: ion's name
- for water molecules: HOH
- Ligand_pose
- for ligands from sdf file: ligand's pose number (indexed from 1)
- for inorganic ions/water molecules: 0
- Ligand_occurrence_in_sdf
- if
-l
ligands' file was passed- for ligands from sdf file: ligand's occurrence from the beginning of sdf file
- for inorganic ions/water molecules: ligand's occurrence (from the beginning of sdf file), for which they mediate the interaction with nucleic acid
- if
-l
ligands' file was not passed: 0 (see -> Interactions with inorganic ions)
- if
- Interaction: interaction type
- Ligand_Atom
- for ligands from sdf file: ligand's atom index
- for inorganic ions/water molecules: residue number : chain
- Ligand_X/Ligand_Y/Ligand_Z: coordinates of ligand's/ion's/water molecule's atom
- Receptor_Residue_Name/Receptor_Number/Receptor_Chain:
- for interactions ligand/inorganic ions/water molecule - nucleic acid: receptor's nucleotide name/receptor's residue number/receptor's chain
- for interactions ligand - ion: ion's name/ion's residue number/ion's chain
- for interactions ligand - water molecule: HOH/water molecule's residue number/water molecule's chain
- Receptor_Atom
- for interactions ligand/inorganic ions/water molecule - nucleic acid: receptor's atom ID
- for interactions ligand - ion: ion's name
- for interactions ligand - water molecule: O
- Receptor_X/Receptor_Y/Receptor_Z: coordinates of ligand's/ion's/water molecule's atom
- Distance: distance between ligand's and residue's atoms [Å]
See sample output -> DETAIL_1aju_model1.pdb_ligands.sdf_FULL.tsv
Detected interactions can be visualized using the dedicated PyMOL plugin, available in the plugin repository.
To visualize interactions in this plugin, use the -detail
outputs mode.
- Calculate SIFt
FULL
, print detected interactions on screen, and save the output in the default location with the default filename.
python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -print
- Calculate SIFt
FULL
with user-defined interactions and a table containing details on each detected interaction with the default filenames in theoutputs
directory.
python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -custom code/custom-interactions.yaml -detail
- Calculate SIFt
SIMPLE
and save the output in the user-declared location with the default filename.
python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -f SIMPLE -o /path/to/my_output/
- Calculate SIFt
PBS
, see what is being calculated using the verbose mode, and save the output with the default filename in theoutputs/
directory.
python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -f PBS -verbose
- Calculate default SIFt
FULL
and save its output along with a table containing details on each detected interaction with the default filenames in theoutputs
directory.
python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -detail
- Calculate default SIFt
FULL
, consider water-mediated interactions, and save its output and three wrapped outputs with the default filenames in theoutputs
directory.
python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -h2o -wrapper ACUG,PuPy,Counter
- Calculate SIFt for nucleic acid - inorganic ions from the same pdb input file (see -> Interactions with inorganic ions).
python code/fingeRNAt.py -r example_inputs/3d2v.pdb -f PBS
- Bonus: useful bash command to transpose resulting fingerprint file (rs program needed:
sudo apt install rs
for Debian-like systems):
cat fingerprint_FULL.tsv | sed -e "s/\t/,/g" | rs -c, -C, -T|sed -e 's/.$//' -e "s/,/\t/g" > fingerprint_FULL-T.tsv
To use the Graphical User Interface (GUI), simply run
python code/gui.py
GUI is user-friendly and has all the aforementioned functionalities.
fingeRNAt has a debugging mode in which it prints on screen exhaustive information about detected interactions.
The debugging mode may be used with each SIFt type and provides the following information:
-
FULL
Prints the following properties of each ligand:
- atom indices of hydrogen bonds acceptors & donors
- atom indices of halogen bonds donors
- atom indices of cations & anions
- atom indices of ligand's aromatic rings
- IDs of inorganic ions in electrostatic contact with ligand_name_detail
- IDs of water molecules in contact with a ligand
- atom indices of lipophilic atoms
- atom indices for each user defined-interactions (single atoms if 1 SMARTS for ligand and tuples if 2 SMARTS for ligand were passed)
Prints the following properties for each residue of nucleic acid:
- atom IDs of hydrogen bonds acceptors & donors
- atom IDs of anions
- atom IDs of aromatic rings
- atom IDs for each user defined-interactions (single atoms if 1 SMARTS for receptor and tuples if 2 SMARTS for receptor were passed)
Prints detected interactions for pair of each nucleic acid residue - ligand:
- atoms creating hydrogen bond with their distance and angle (if parameter
-dha
was passed), separately for cases when the nucleic acid is hydrogen bond acceptor and hydrogen bond donor - atoms creating halogen bond with their distance and angles
- atoms creating cation-anion interaction with their distance (note that nucleic acid's atoms are anions)
- atoms creating Pi-cation interaction with their distance and angle
- atoms creating Pi-anion interaction with their distance and angle (note that ligand's atoms are anions)
- atoms creating anion-Pi interaction with their distance and angle (note that nucleic acid's atoms are anions)
- atoms creating Pi-stacking interaction type Sandwich/Displaced with their distance, offset, and angle
- atoms creating Pi-stacking interaction type T-shaped with their distance, offset, and angle
- atoms creating ion-mediated interaction with their distances (nucleic acid - ion & ion - ligand)
- atoms creating water-mediated interaction with their distances (nucleic acid - water & water - ligand)
- atoms creating lipophilic interaction with their distance
- atoms creating each user-defined interaction with their distance and/or angle(s)
-
SIMPLE
For each detected contact of nucleic acid residue-ligand prints information about atoms IDs and the distance between them.
-
PBS
For each detected contact of nucleic acid residue's group - ligand prints information about atoms IDs and the distance between them.
NOTE: In all the above cases, only the first detected interaction of a given type is printed, as fingeRNAt stops searching for more once it detected one particular interaction.
Please pay attention to the following types of errors: Could not sanitize molecule ending on line ....
This means that the RDKit library used by the fingeRNAt cannot properly read the molecule.
e.g.
[08:46:45] non-ring atom 39 marked aromatic
[08:46:45] ERROR: Could not sanitize molecule ending on line 168
[08:46:45] ERROR: non-ring atom 39 marked aromatic
- Error: non-ring atom ... marked aromatic
- Solution: please make sure that the mentioned molecule has a proper aromatic ring representation.
- Error: Explicit valence for atom # ..., is greater than permitted (eg., Explicit valence for atom # 18 O, 3, is greater than permitted)
- Solution: please make sure that the indicated atom(s) have a proper valence number, i.e., they form a correct number of bonds.
What happens when I have a non-canonical nucleotide in my nucleic acid?
- If you have a residue with only a non-canonical name (all atom names are canonical), e.g., X
FULL |
SIMPLE |
PBS |
|
---|---|---|---|
No wrapper | OK | OK | OK |
ACUG |
Omits interaction for residue with a non-canonical name | Omits interaction for residue with a non-canonical name | Omits interaction for residue with a non-canonical name |
PuPy |
Omits interaction for residue with a non-canonical name | Omits interaction for residue with a non-canonical name | Omits interaction for residue with a non-canonical name |
Counter |
OK | OK | OK |
- If you have a residue with a canonical name but with a non-canonical atom name, e.g., P9
FULL |
SIMPLE |
PBS |
|
---|---|---|---|
No wrapper | OK | OK | Does not work |
ACUG |
OK | OK | Does not work |
PuPy |
OK | OK | Does not work |
Counter |
OK | OK | Does not work |
NOTE: We consider both oxygens from the phosphate group (OP1 and OP2) of nucleic acid as negatively charged, therefore fingeRNAt will not consider differently named atoms as anions.
- If you have a residue with a non-canonical name and non-canonical atom name, e.g., P9
FULL |
SIMPLE |
PBS |
|
---|---|---|---|
No wrapper | OK | OK | Does not work |
ACUG |
Omits interaction for residue with a non-canonical name | Omits interaction for residue with a non-canonical name | Does not work |
PuPy |
Omits interaction for residue with a non-canonical name | Omits interaction for residue with a non-canonical name | Does not work |
Counter |
OK | OK | Does not work |
NOTE: We consider both oxygens from the phosphate group (OP1 and OP2) of nucleic acid as negatively charged, therefore fingeRNAt will not consider differently named atoms as anions.
What happens when I have nucleic acid with two residues with the same number, e.g., due to errors in structure?
In the case of SIFt types SIMPLE
and PBS
, the only difference is that their outputs will have two columns with the same name in the output. SIFt and the wrapped results are correct.
However, in the case of SIFt type FULL
, there will be two columns with the same name, but their Pi-interactions may be swapped, and their SIFt, as well as the wrapped results, may be unreliable.
fingerDISt is an additional, standalone tool that calculates different Distance Metrics based on Structural Interaction Fingerprint (SIFt) - outputs of fingeRNAt.
It calculates the selected Distance Metric for all SIFt vs. all SIFt from the input file - creates a matrix of scores and saves it to a tsv file.
fingerDISt, similarly like fingeRNAt, requires Python 3.5 - 3.9 and may be run from within the fingeRNAt's environment, but it is not obligatory. No external modules are needed.
cd fingeRNAt
python code/fingerDISt.py -i example_outputs/1aju_model1.pdb_ligands.sdf_FULL.tsv -m tanimoto
fingerDISt accepts the following parameters:
-i
path to tsv/csv file with calculated SIFt; see -> Inputs
-m
types of desired Distance Metrics; see -> Distance Metrics
[-o]
optional path to save the output
[-verbose]
prints calculated Distance Metrics on the screen
[-h]
show help message
-i
: path to tsv/csv file with calculated SIFs
✨ fingeRNAt outputs are fingerDISt inputs ✨
fingerDISt calculates the following Distance Metrics:
- Tanimoto coefficient
- Cosine similarity
- Manhattan
- Euclidean
- Square Euclidean
- Half Square Euclidean
- Soergel
- Tversky
Some Distance Metrics calculations were implemented based on the crux-fr-sprint code under the MIT license.
NOTE: Tanimoto coefficient works only for SIFt with binary values, therefore it may not work on input SIFt wrapped with
Counter
wrapper.
NOTE 2: It automatically replaces
None
with 0, meaning that Distance Metrics can be calculated for SIFt typeFULL
, which was called without-h2o
parameter.
NOTE 3: ** The Tversky coefficient** is not symmetric. By default, in the resulting matrix, the reference molecules are in columns while compared molecules are in rows. Also it has a hard-coded α and β coefficients with widely used values of α=1 and β=0 (e.g., see: Leung et al. "SuCOS is Better than RMSD for Evaluating Fragment Elaboration and Docking Poses"). To modify this behavior or coefficient values, please modify the function
tversky(self, p_vec, q_vec)
in thecode/DistanceMetrics.py
module.
fingerDISt saves scores for each selected Distance Metric to separate tsv files - a simple text format similar to csv, except for the data being tab-separated instead of comma-separated.
If fingerDISt was run without optional parameter -o
, the script will create outputs/
directory in the working directory and save there the output in tsv format. Otherwise, fingerDISt will save outputs in the user-specified location.
Sample output of running python code/fingerDISt.py -i tests/expected_outputs/1aju_model1.pdb_ligands.sdf_FULL.tsv -m tanimoto
- Calculate all available Distance Metrics on SIFts inputs type
FULL
and save the output with the default filename in theoutputs
directory.
python code/fingerDISt.py -i tests/expected_outputs/1aju_model1.pdb_ligands.sdf_FULL.tsv -m manhattan,square_euclidean,euclidean,half_square_euclidean,cosine_similarity,tanimoto,soergel,tversky
- Calculate two Distance Metrics on SIFts inputs type
PBS
wrapped withACUG
wrapper and save the output to a user-specified location.
python code/fingerDISt.py -i tests/expected_outputs/1aju_model1.pdb_ligands.sdf_PBS_ACUG.tsv -m manhattan,square_euclidean -o my_dir
- Calculate one Distance Metric on SIFts inputs type
SIMPLE
, print it on the screen, and save the output to the user-specified location.
python code/fingerDISt.py -i tests/expected_outputs/1aju_model1.pdb_ligands.sdf_SIMPLE.tsv -m tanimoto -verbose -o my_dir
- Call fingerDISt directly from the fingeRNAt (will calculate the passed Distance Metrics on the calculated SIFts output (however not any wrapped one output) and save the result in the same default/given location).
python code/fingeRNAt.py -r example_inputs/1aju_model1.pdb -l example_inputs/ligands.sdf -fingerDISt tanimoto,tversky
fingeRNAt is also provided as a singularity image. It contains two main command-line programs: fingeRNAt.py
and fingerDISt.py
and also (as a bonus) the OpenBabel tool-box.
# get information and the overwiew of the available commands:
./singularity-fingernat.img
# exec the fingernat:
singularity exec ./singularity-fingernat.img fingeRNAt.py
# perform some calculations:
singularity exec ./singularity-fingernat.img
fingeRNAt.py -r tests/1aju_model1.pdb -l tests/ligands.sdf -detail -verbose
# use openbabel to convert ligands:
singularity exec ./singularity-fingernat.img obabel tests/ligands.sdf -O tests/ligands.pdbqt -f 1 -l 1
Tested at the Interdisciplinary Centre for Mathematical and Computational Modelling UW and LUMI Supercomputer - thanks!
See the singularity image in action:
One can easily parallelize fingeRNAt with GNU parallel, e.g., for parallel processing of multiple ligands/ligand sets:
# calculate fingerprints for all sdf ligands from firectory ligands
# and rna.pdb
find ligands/ -type f -name "*.sdf" | parallel --progress "fingeRNAt.py -r rna.pdb -l {}"
# the same, but using a singularity image:
find ligands/ -type f -name "*.sdf" | parallel --progress "singularity exec ./singularity-fingernat.img fingeRNAt.py -r rna.pdb -l {}"
See GNU Parallel for full documentation.
To generate the fingeRNAt documentation file using sphinx:
cd docs
make html
The documentation will be available from _build/html
.
To run a unit test:
cd tests
python fingeRNAt_test.py
See: implementation details.
github | contact | |
---|---|---|
Natalia Szulc | @n-szulc | |
Filip Stefaniak | @filipsPL |
We welcome any feedback, please send an email to Natalia Szulc or submit a bug report.
Discussion and questions may be asked on the discussion page.
Special thanks of gratitude to Masoud Farsani, Pritha Ghosh, and Tomasz Wirecki for their invaluable feedback, as well as to Prof. Janusz M. Bujnicki and the entire Bujnicki Lab for all the support and project guidelines.
Extensive script testing provided by Zuzanna Mackiewicz has been a great help in developing this tool.
Assistance provided by OpenBabel Community was greatly appreciated.
If you use this software, please cite:
fingeRNAt - a novel tool for high-throughput analysis of nucleic acid-ligand interactions
Natalia A. Szulc, Zuzanna Mackiewicz, Janusz M. Bujnicki, Filip Stefaniak
PLOS Computational Biology
doi: 10.1371/journal.pcbi.1009783
Supplementary code and data regarding the manuscript can be found here.
fingeRNAt is licensed under the GNU General Public License v3.0.