-
Notifications
You must be signed in to change notification settings - Fork 3
Datasets
The Chembl Postgresql database can be created in the virtual machine by running the following script:
chembldb_create <chembl_version>
When the script is run:
- Downloads Chembl dump
- Imports dump
- Creates rdkit tables with molecules (mols_rdkit) and fingerprints (fps_rdkit).
It takes several hours for the script to finish.
Connect to database on the command line using (replace <chembl version> with version printed by script):
PGPASSWORD=chembl psql -h localhost -U chembl chembl_<chembl version>
The data sets can be queried using the kripodb
command line utility,
see https://github.com/3D-e-Chem/kripodb for more information.
A tiny data set is available in /data/kripo/tiny
directory.
- fragments.sqlite - Fragments sqlite database containing a small number of fragments with their smiles string and molblock.
- fingerprints.sqlite - Fingerprints sqlite database with fingerprint stored as fastdumped intbitset
- distances.h5 - HDF5 file with distance matrix of fingerprints using modified tanimoto coefficient
A GPCR data set is available in /data/kripo/gpcr
directory.
All fragments based on GPCR proteins compared with all proteins in PDB.
- kripo.gpcrandhits.sqlite - Fragments sqlite database
- kripo.gpcr.h5 - HDF5 file with distance matrix
The data set has been published at
The PDB fragment data set is available in /data/kripo/pdb
directory.
All fragments form all proteins in PDB compared with all.
- fragments.sqlite - Fragments sqlite database, fetched from http://3d-e-chem.vu-compmedchem.nl/kripodb/fragments.sqlite.
- Distance matrix is too big to ship with VM so use http://3d-e-chem.vu-compmedchem.nl/kripodb webservice url to query.