From 40ec0aef0d3c2b9d353b468211bed7dd61405531 Mon Sep 17 00:00:00 2001 From: Alexandra McIsaac Date: Fri, 1 Nov 2024 15:24:31 -0700 Subject: [PATCH 1/3] Update README.md with details about compute expansion --- README.md | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index e63cf7f8..902ae824 100644 --- a/README.md +++ b/README.md @@ -91,14 +91,27 @@ The programatic description is provided below, with an example of the notebook a 1. Create a new branch as described above, and navigate to the submission directory of the dataset you want to expand. 2. Create a new jupyter notebook called `generate-compute.ipynb` [example here](https://github.com/openforcefield/qca-dataset-submission/blob/master/submissions/2024-09-18-OpenFF-NAGL2-ESP-Timing-Benchmark-v1.1/generate-compute.ipynb). -3. In the notebook, either download the original dataset and remove the molecules and _original_ `QCSpec`, or re-create the dataset with the same metadata as the original (e.g. same name, description, etc) and skip the molecule addition step. +3. In the notebook, either download the original dataset and remove the molecules and _original_ `QCSpec`, or re-create the dataset with the same name as the original and skip the molecule addition step. See below for details about how changes to the dataset are propogated; note that the dataset name must be the same, and changes to any metadata except `compute-tag` and the `QCSpec` will be ignored when submitting the compute expansion. * Please note that the default `compute_tag` is `openff`; if you need to use a different one, please add it explicitly to the dataset at this step, as the `compute.json` file overrides the compute tag added manually to the PR. If you do need to change the compute tag after submission, you can change it by updating the label on the PR and the change will take effect when the error cycling action runs next. -4. Add the _new_ `QCSpec` to the dataset, and save the dataset to `compute.json`, example [here](https://github.com/openforcefield/qca-dataset-submission/blob/add-ddx-to-nagl-benchmark/submissions/2024-09-18-OpenFF-NAGL2-ESP-Timing-Benchmark-v1.1/compute.json). +4. Add the _new_ `QCSpec` to the dataset, and save the dataset to `compute.json`, example [here](https://github.com/openforcefield/qca-dataset-submission/blob/add-ddx-to-nagl-benchmark/submissions/2024-09-18-OpenFF-NAGL2-ESP-Timing-Benchmark-v1.1/compute.json). 5. Add the additional compute spec to the submission's `README.md` file. 6. Add the `generate-compute.ipynb` and `compute.json` files to the submission's `QCSubmit Manifest` entry in the `README.md` file. 7. Proof the submission and open a PR. Dataset validation will run automatically. 8. Once the dataset is validated, request a review, and once approved, your compute expansion will be submitted! +When the PR is merged, the following happens: + +* CI checks for `compute*.json*`, so files can be called anything so long as they follow that pattern. + +* This gets loaded into a QCSubmit `dataset` structure in CI (see `lifecycle.py`, `SubmittableBase`) and submitted to MolSSI with `openff.qcsubmit.datasets.datasets._BaseDataset.submit` + +* `submit()` checks if the dataset already exists using only the dataset type and name. So changes in descriptions, etc., other metadata don't affect anything. New / different molecules will also be ignored if the dataset name already exists. + +* `submit()` adds the specifications + +* `submit()` submits with the `compute_tag` and `priority` within the new `compute.json`. + +* Other info in the dataset, such as `dataset_tags`, are not incorporated into additional compute submissons and thus changing them will not affect the dataset. # The Lifecycle of a Dataset Submission From 259e2860a8d9c85aeeda804e8ce8e24956f24433 Mon Sep 17 00:00:00 2001 From: Alexandra McIsaac Date: Fri, 1 Nov 2024 15:31:19 -0700 Subject: [PATCH 2/3] Update README.md --- README.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 902ae824..29980248 100644 --- a/README.md +++ b/README.md @@ -91,7 +91,8 @@ The programatic description is provided below, with an example of the notebook a 1. Create a new branch as described above, and navigate to the submission directory of the dataset you want to expand. 2. Create a new jupyter notebook called `generate-compute.ipynb` [example here](https://github.com/openforcefield/qca-dataset-submission/blob/master/submissions/2024-09-18-OpenFF-NAGL2-ESP-Timing-Benchmark-v1.1/generate-compute.ipynb). -3. In the notebook, either download the original dataset and remove the molecules and _original_ `QCSpec`, or re-create the dataset with the same name as the original and skip the molecule addition step. See below for details about how changes to the dataset are propogated; note that the dataset name must be the same, and changes to any metadata except `compute-tag` and the `QCSpec` will be ignored when submitting the compute expansion. +3. In the notebook, either download the original dataset and remove the molecules and _original_ `QCSpec`, or re-create the dataset with the same name as the original and skip the molecule addition step. +* See below for details about how changes to the dataset are propagated; note that the dataset name must be the same, and changes to any metadata except `compute-tag` and the `QCSpec` will be ignored when submitting the compute expansion. * Please note that the default `compute_tag` is `openff`; if you need to use a different one, please add it explicitly to the dataset at this step, as the `compute.json` file overrides the compute tag added manually to the PR. If you do need to change the compute tag after submission, you can change it by updating the label on the PR and the change will take effect when the error cycling action runs next. 4. Add the _new_ `QCSpec` to the dataset, and save the dataset to `compute.json`, example [here](https://github.com/openforcefield/qca-dataset-submission/blob/add-ddx-to-nagl-benchmark/submissions/2024-09-18-OpenFF-NAGL2-ESP-Timing-Benchmark-v1.1/compute.json). 5. Add the additional compute spec to the submission's `README.md` file. @@ -103,9 +104,9 @@ When the PR is merged, the following happens: * CI checks for `compute*.json*`, so files can be called anything so long as they follow that pattern. -* This gets loaded into a QCSubmit `dataset` structure in CI (see `lifecycle.py`, `SubmittableBase`) and submitted to MolSSI with `openff.qcsubmit.datasets.datasets._BaseDataset.submit` +* This gets loaded into a QCSubmit `dataset` structure in CI (see `lifecycle.py`, `SubmittableBase`) and submitted to MolSSI with `openff.qcsubmit.datasets.datasets._BaseDataset.submit()` -* `submit()` checks if the dataset already exists using only the dataset type and name. So changes in descriptions, etc., other metadata don't affect anything. New / different molecules will also be ignored if the dataset name already exists. +* `submit()` checks if the dataset already exists using only the dataset type and name. Changes in descriptions, other metadata, etc. don't affect anything. New/different molecules will also be ignored if the dataset name already exists. * `submit()` adds the specifications From f3cbe78c6ee98c5a2bf558e79594777db7a0cf8f Mon Sep 17 00:00:00 2001 From: Alexandra McIsaac Date: Mon, 4 Nov 2024 08:37:49 -0800 Subject: [PATCH 3/3] adding links --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 29980248..9962ee54 100644 --- a/README.md +++ b/README.md @@ -104,7 +104,7 @@ When the PR is merged, the following happens: * CI checks for `compute*.json*`, so files can be called anything so long as they follow that pattern. -* This gets loaded into a QCSubmit `dataset` structure in CI (see `lifecycle.py`, `SubmittableBase`) and submitted to MolSSI with `openff.qcsubmit.datasets.datasets._BaseDataset.submit()` +* This gets loaded into a QCSubmit `dataset` structure in CI (see `lifecycle.py`, [`SubmittableBase`](https://github.com/openforcefield/qca-dataset-submission/blob/master/management/lifecycle.py#L333)) and submitted to MolSSI with [`openff.qcsubmit.datasets.datasets._BaseDataset.submit()`](https://github.com/openforcefield/openff-qcsubmit/blob/main/openff/qcsubmit/datasets/datasets.py#L174) * `submit()` checks if the dataset already exists using only the dataset type and name. Changes in descriptions, other metadata, etc. don't affect anything. New/different molecules will also be ignored if the dataset name already exists.