Add QM9 dataset #403

n0w0f · 2023-10-05T09:57:01Z

add qm9 dataset #402

data/tabular/qm9/meta.yaml

Co-authored-by: Kevin M Jablonka <32935233+kjappelbaum@users.noreply.github.com>

kjappelbaum · 2023-10-13T05:07:00Z

data/tabular/qm9/meta.yaml

+      {u298#} Hartree at 298.15 K when calculated using {#Density Functional Theory|DFT!} with B3LYP {#exchange correlation functional|functional|accuracy!}.
+    - The {#molecule|compound|chemical!} with the {smiles__description} representation of {smiles#} when calculated with B3LYP DFT simlulations has an enthalpy
+      of {h298#} Hartree at 298.15 K.
+    - The {smiles__description} {smiles#} {#represents|is from!} a {#molecule|compound|chemical|molecular species|chemical compound!} that has a Gibbs free


That is super nice! I like all the synonyms!

data/tabular/qm9/prep_csv.py

kjappelbaum · 2023-10-13T05:09:17Z

data/tabular/qm9/transform.py

+    filename_to_save = "data_clean.csv"
+
+    # Load the dataset from Hugging Face
+    dataset = datasets.load_dataset(dataset_name, split=split_name)


That is super awesome! Thanks for directly uploading it to HF! 💯 Very happy to see that!

data/tabular/qm9/prep_csv.py

data/tabular/qm9/meta.yaml

kjappelbaum

Thanks 🔥

kjappelbaum · 2023-10-13T05:55:23Z

We should also consider adding the XYZ files into some templates. For this, we would need to update the sampling code to consider XYZ as identifier and then do something like https://gist.github.com/kjappelbaum/a5f855945582c3f00c4bb245e5432bfe (#380 , #393 )

In data_clean would either be a column with the XYZ file or a filepath to the file.
Then we would read this, wrap this into a standardized block and you can have the same kinds of templates.

But, importantly, we might now also resolve the influence of changes in the geometry..

Let's keep this for another PR, but it would be a very useful addition!

n0w0f added 4 commits October 5, 2023 01:23

chore: lint

421a681

feat: add transform.py to clean and prep data

35aff67

feat: add meta.yaml

5562426

feat: add qm9 dataset

eddd2c0

n0w0f changed the title ~~n0w0f/qm9~~ Add QM9 dataset Oct 5, 2023

kjappelbaum reviewed Oct 5, 2023

View reviewed changes

data/tabular/qm9/meta.yaml Outdated Show resolved Hide resolved

kjappelbaum reviewed Oct 5, 2023

View reviewed changes

data/tabular/qm9/meta.yaml Outdated Show resolved Hide resolved

kjappelbaum reviewed Oct 5, 2023

View reviewed changes

data/tabular/qm9/meta.yaml Outdated Show resolved Hide resolved

kjappelbaum reviewed Oct 5, 2023

View reviewed changes

data/tabular/qm9/meta.yaml Outdated Show resolved Hide resolved

kjappelbaum reviewed Oct 5, 2023

View reviewed changes

data/tabular/qm9/meta.yaml Outdated Show resolved Hide resolved

n0w0f and others added 4 commits October 8, 2023 15:24

fix: apply suggestions from code review

093053c

Co-authored-by: Kevin M Jablonka <32935233+kjappelbaum@users.noreply.github.com>

fix: remove inchi templates & typos

75f0fe9

fix: variable to catch return from preparte data (no of data points)

78f7c5c

fix: better sampling in templates

e1d56df

kjappelbaum reviewed Oct 13, 2023

View reviewed changes

data/tabular/qm9/prep_csv.py Show resolved Hide resolved

Update data/tabular/qm9/prep_csv.py

1990e55

kjappelbaum reviewed Oct 13, 2023

View reviewed changes

data/tabular/qm9/prep_csv.py Outdated Show resolved Hide resolved

Update data/tabular/qm9/prep_csv.py

05f6f81

kjappelbaum reviewed Oct 13, 2023

View reviewed changes

data/tabular/qm9/meta.yaml Outdated Show resolved Hide resolved

Update data/tabular/qm9/meta.yaml

acb576a

kjappelbaum reviewed Oct 13, 2023

View reviewed changes

data/tabular/qm9/meta.yaml Outdated Show resolved Hide resolved

kjappelbaum and others added 2 commits October 13, 2023 07:11

Update data/tabular/qm9/meta.yaml

5846f0d

update for sampling

beda4bc

kjappelbaum approved these changes Oct 13, 2023

View reviewed changes

remove lint

12133a7

kjappelbaum merged commit e15bc69 into OpenBioML:main Oct 13, 2023

kjappelbaum mentioned this pull request Oct 15, 2023

Add Qm9 dataset #402

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add QM9 dataset #403

Add QM9 dataset #403

n0w0f commented Oct 5, 2023

kjappelbaum Oct 13, 2023

kjappelbaum Oct 13, 2023

kjappelbaum left a comment

kjappelbaum commented Oct 13, 2023 •

edited

Loading

Add QM9 dataset #403

Add QM9 dataset #403

Conversation

n0w0f commented Oct 5, 2023

kjappelbaum Oct 13, 2023

Choose a reason for hiding this comment

kjappelbaum Oct 13, 2023

Choose a reason for hiding this comment

kjappelbaum left a comment

Choose a reason for hiding this comment

kjappelbaum commented Oct 13, 2023 • edited Loading

kjappelbaum commented Oct 13, 2023 •

edited

Loading