Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add QM9 dataset #403

Merged
merged 14 commits into from
Oct 13, 2023
Merged

Add QM9 dataset #403

merged 14 commits into from
Oct 13, 2023

Conversation

n0w0f
Copy link
Contributor

@n0w0f n0w0f commented Oct 5, 2023

add qm9 dataset #402

@n0w0f n0w0f changed the title n0w0f/qm9 Add QM9 dataset Oct 5, 2023
{u298#} Hartree at 298.15 K when calculated using {#Density Functional Theory|DFT!} with B3LYP {#exchange correlation functional|functional|accuracy!}.
- The {#molecule|compound|chemical!} with the {smiles__description} representation of {smiles#} when calculated with B3LYP DFT simlulations has an enthalpy
of {h298#} Hartree at 298.15 K.
- The {smiles__description} {smiles#} {#represents|is from!} a {#molecule|compound|chemical|molecular species|chemical compound!} that has a Gibbs free
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is super nice! I like all the synonyms!

filename_to_save = "data_clean.csv"

# Load the dataset from Hugging Face
dataset = datasets.load_dataset(dataset_name, split=split_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is super awesome! Thanks for directly uploading it to HF! 💯 Very happy to see that!

Copy link
Collaborator

@kjappelbaum kjappelbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 🔥

@kjappelbaum
Copy link
Collaborator

kjappelbaum commented Oct 13, 2023

We should also consider adding the XYZ files into some templates. For this, we would need to update the sampling code to consider XYZ as identifier and then do something like https://gist.github.com/kjappelbaum/a5f855945582c3f00c4bb245e5432bfe (#380 , #393 )

In data_clean would either be a column with the XYZ file or a filepath to the file.
Then we would read this, wrap this into a standardized block and you can have the same kinds of templates.

But, importantly, we might now also resolve the influence of changes in the geometry..

Let's keep this for another PR, but it would be a very useful addition!

@kjappelbaum kjappelbaum merged commit e15bc69 into OpenBioML:main Oct 13, 2023
@kjappelbaum kjappelbaum mentioned this pull request Oct 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants