This repository provides an easy-to-implement python module called PyConforMap that generates scatter plots of instantaneous shape ratio (Rs) against relative radius of gyration (Rg/Rgmean).
PLEASE READ ALL DOCUMENTATION
There are two main main metrics: the relative radius of gyration (Rg/Rgmean) and the instantaneous shape ratio (Rs). Rs is computed as Rs = Ree2/Rg2 where Ree and Rg are (instantaneous) end-to-end distance and (instantaneous) radius of gyration respectively.
The Rg/Rgmean is a measure of (relative) size for a protein or polymer chain, and Rs is a measure of its shape. Rs is expected to be low (~2 or lower) for compact structures and high for highly extended structures (~12 or higher). A single Rg/Rgmean value and corresponding Rs value for a polymer together is how we define its instantaneous conformation. When all the Rg/Rgmean and Rs values of a polymer are plotted together, they constitute what we call a 2D map of the conformational landscape of that polymer.
This module generates 2D scatter plots of Rs against Rg/Rgmean for a protein/polymer simulation (data and protein label/identity provided by user) and a Gausssian Walk (GW) polymer model simulation (data for 720000 snapshots of a GW model of length 100 included with repository). Each point on the scatter plot (belonging to either GW or a given protein/polymer) represents a conformation snapshot, and has coordinates (Rg/Rgmean, Rs). The GW model is intended to be a reference model, whose conformational landscape map (i.e. as represented by all the (Rg/Rgmean, Rs) points) serves as a 'universal' or reference map for those of other proteins/polymers. Using the 2D scatter plot, an fC, representing the fraction of the GW points 'close' (i.e. within a pre-defined radius) to at least one protein/polymer point, is automatically calculated. fC is a quantity that represents the conformational diversity of the protein/polymer provided, and can be used to rank the conformational diversities of different proteins/polymers. The included GW file is 'GW_chainlen100.csv.' The python module can be additionally used to conduct a new GW simulation with different chain length and number of snapshots, should the user wish to do so. On the scatter plot, it is important that the protein/polymer points do not significantly exceed the boundaries defined by the reference (GW) points. Most of the protein/polymer points should be 'close' (i.e. within a pre-defined radius) to at least one GW point.
The needed input is a csv file (for a given protein/polymer simulation) with 2 columns. The first column contains Rg2 values and the second column contains Ree2 values. In this (user provided) file, each row represents a protein/polymer conformation snapshot from the simulation. An example input is the 'example_protein.csv' csv file (included with repository).
The 'code_input_output.md' file provides technical details (input arguments, expected outputs) of the module. The 'pyconformap.py' file contains the source code for the module. The 'illustrated_example.ipynb' jupyter notebook file shows examples to illustrate implementation of the code. The 'GW_chainlen100.csv' is the reference GW simulation and 'example_protein.csv' is the simulation of an example protein.
The module requires the pandas, numpy, matplotlib, scipy, itertools, more_itertools and random python packages. They are automatically loaded when the 'pyconformap.py' file is executed, as shown in the illustrated examples.
PyConforMap is companion to this publication.
If you use this module, please cite us using the provided DOI.
If you have comments/suggestions or a bug report, please feel free to email me at hossain.shadman17@gmail.com, or contact me through my social media links provided in the home page.