Datasets

Chembl

The Chembl Postgresql database can be created in the virtual machine by running the following script:

chembldb_create <chembl_version>

When the script is run:

It takes several hours for the script to finish.

Connect to database on the command line using (replace <chembl version> with version printed by script):

PGPASSWORD=chembl psql -h localhost -U chembl chembl_<chembl version>

The data sets can be queried using the kripodb command line utility, see https://github.com/3D-e-Chem/kripodb for more information.

A tiny data set is available in /data/kripo/tiny directory.

fragments.sqlite - Fragments sqlite database containing a small number of fragments with their smiles string and molblock.
fingerprints.sqlite - Fingerprints sqlite database with fingerprint stored as fastdumped intbitset
distances.h5 - HDF5 file with distance matrix of fingerprints using modified tanimoto coefficient

A GPCR data set is available in /data/kripo/gpcr directory. All fragments based on GPCR proteins compared with all proteins in PDB.

The data set has been published at

The PDB fragment data set is available in /data/kripo/pdb directory. All fragments form all proteins in PDB compared with all.

fragments.sqlite - Fragments sqlite database, fetched from http://3d-e-chem.vu-compmedchem.nl/kripodb/fragments.sqlite.
Distance matrix is too big to ship with VM so use http://3d-e-chem.vu-compmedchem.nl/kripodb webservice url to query.