Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3D features and other features not working with SMILES #96

Open
ky66 opened this issue Apr 12, 2021 · 2 comments
Open

3D features and other features not working with SMILES #96

ky66 opened this issue Apr 12, 2021 · 2 comments

Comments

@ky66
Copy link

ky66 commented Apr 12, 2021

So, I am trying to use Mordred to get all the features (1875) of my molecules. The Mordred package fails on 3D features given just SMILEs. For the columns of the 3D features, Mordred gives (missing 3D coordinate (GeomDiameter/mordred._graph_matrix.Diameter3D(True, False)/mordred._graph_matrix.DistanceMatrix3D(True, False)). There are some other features missing as well. Those other missing items often give error (min() arg is an empty sequence (MINtsC)) as their column value as an example.

I realize that I should give 3D features instead of just SMILES to Mordred. How can I get those 3D features from SMILES using RDKIT and can someone post an example of those 3D features being fed to Mordred? How do I solve the second problem of (min() arg is an empty sequence (MINdsCH)) for instance for other variables too?

I have also tried the solution in #93 and that solution does not work.

@mohammednooraldeen
Copy link

`from rdkit import Chem
from mordred import Calculator, descriptors
import numpy as np
import pandas as pd

List of SMILES strings

smiles_list = [
'CCO', # Ethanol
'CC(=O)O', # Acetic acid
'CCN(CC)CC', # Triethylamine
'CC(=O)OC1=CC=CC=C1C(=O)O', # Aspirin
'C1CCC(CC1)N', # Cyclohexylamine
]

Function to convert SMILES to RDKit molecule

def smiles_to_molecule(smiles):
return Chem.MolFromSmiles(smiles)

Convert SMILES to molecule objects

molecules = [smiles_to_molecule(smiles) for smiles in smiles_list]

Initialize Mordred descriptor calculator

calc = Calculator(descriptors, ignore_3D=True)

Calculate descriptors for each molecule

descriptors_list = []
for mol in molecules:
if mol is not None:
# Calculate descriptors and fill missing values with 0
desc = calc(mol).fill_missing(0)
descriptors_list.append([d if d is not None else 0 for d in desc])

Convert list of descriptors to a numpy array

descriptors_array = np.array(descriptors_list)

Output the array of descriptors

print(descriptors_array)

Save the descriptors array to a CSV file

df = pd.DataFrame(descriptors_array, columns=[str(d) for d in calc.descriptors])
df.to_csv('molecular_descriptors.csv', index=False)
`

@JacksonBurns
Copy link

@ky66 mordred only allows calculating 3D features from mol or SDF files. You can use RDKit to generate a 3D conformer and then calculate the 3D features from that, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants