From 694125ec53d02193614711e197ecea1002e37250 Mon Sep 17 00:00:00 2001 From: Matthew Evans Date: Mon, 2 Oct 2023 16:10:17 +0200 Subject: [PATCH 1/2] Add a first draft of the docs --- README.md | 4 +- src/mcloud_implementation/docs/index.md | 73 +++++++++++++++++++++++++ 2 files changed, 76 insertions(+), 1 deletion(-) create mode 100644 src/mcloud_implementation/docs/index.md diff --git a/README.md b/README.md index 1706d6e..afbca56 100644 --- a/README.md +++ b/README.md @@ -16,13 +16,15 @@ and hosted as [OPTIMADE APIs](https://optimade.org), enabling enhanced data disc explorability. This prototype repository contains two Python packages that work towards this -aim. +aim, as well as example scripts for deployment. - `mc_optimade`: defines a config file format for annotating archives and registered the desired OPTIMADE entries, and a workflow for ingesting them and converting into OPTIMADE types using pre-existing parsers (e.g., ASE for structures). The archive is converted into an intermediate [OPTIMADE JSON Lines](https://github.com/Materials-Consortia/OPTIMADE/issues/471#issuecomment-1589274856) format that can be ingested into a database and used to serve a full OPTIMADE API. - `optimade_launch`: provides a platform for launching an OPTIMADE API server from such a JSON lines file. It does so using the [`optimade-python-tools`](https://github.com/Materials-Consortia/optimade-python-tools/) reference server implementation. +- `mcloud_implementation`: A set of tools and configuration used to deploy the +"archive watcher" and associated scrapers for MCA. ## Relevant links diff --git a/src/mcloud_implementation/docs/index.md b/src/mcloud_implementation/docs/index.md new file mode 100644 index 0000000..a3e82db --- /dev/null +++ b/src/mcloud_implementation/docs/index.md @@ -0,0 +1,73 @@ +# Materials Cloud Archive OPTIMADE integration + +Users can now specify that their Materials Cloud Archive (MCA) submissions be +hosted with an [OPTIMADE API](https://optimade.org), allowing structural data +(and otherwise) to be queried by OPTIMADE clients. +This makes any structural data more discoverable, as structures and their +properties will be returned alongside queries to other major [data +providers](https://www.optimade.org/providers-dashboard/), and additionally +enables future programmatic re-use of the data. +This approach has already found use in select cases where AiiDA graphs were +exported and stored by MCA, and subsequently exposed with OPTIMADE APIs, but now +the functionality can be used on many common data types such as those understood +by [ASE](https://wiki.fysik.dtu.dk/ase/) and [pymatgen](https://pymatgen.org). + +To enable this for an MCA submission, users must provide an additional config +file at the top-level of their submission, named `optimade.yml`. +The contents of this file will instruct the MCA data pipelines to ingest data +from supported formats, then create and expose a queryable database. +The full config file format, with examples, is described in the +[MCA-OPTIMADE integration GitHub repository](https://github.com/materialscloud-org/archive-optimade-integration/). + +## Example + +As a simple illustration of the functionality, let's say a user is submitting a +.zip file containing Crystallographic Information Files (CIF) describing the +outputs of some calculations, with a simple `.csv` file describing computed +properties of those crystals. + +In this case, the config file first has to describe where the structural +data can be found, e.g.,: + +```yaml +entries: + - entry_type: structures + entry_paths: + - file: structures.zip + matches: + - structures/cifs/*.cif +``` + +Here, ASE will be used to parse the CIF files, and the +[optimade-python-tools](https://github.com/Materials-Consortia/optimade-python-tools) +library will be used to construct an OPTIMADE structure object. + +The location of the computed properties can then be defined in a similar way (continuing the `entries->entry_type` block): + +```yaml +entries: + - entry_type: structures + property_paths: + - file: + matches: + - data/data.csv + - data/data2.csv +``` + +Finally, definitions for the properties found in the `.csv` files can be +configured for enhanced sharing via OPTIMADE: + +```yaml +entries: + - entry_type: structures + property_definitions: + - name: energy + title: Total energy per atom + description: The total energy per atom as computed by GGA-DFT. + unit: eV/atom + type: float +``` + +which will enable database queries over these properties, and easier re-use by other scientists. + +This full example, along with more complex examples, can be found on GitHub at [materialscloud-org/arcihve-optimade-integration](https://github.com/materialscloud-org/archive-optimade-integration/tree/main/src/mc_optimade/examples/folder_of_cifs). From 9e703b338f20990f7c16f63c116b7f01a7cb2dc3 Mon Sep 17 00:00:00 2001 From: Kristjan Eimre Date: Tue, 3 Oct 2023 18:31:08 +0200 Subject: [PATCH 2/2] tiny changes --- src/mcloud_implementation/docs/index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mcloud_implementation/docs/index.md b/src/mcloud_implementation/docs/index.md index a3e82db..46ab543 100644 --- a/src/mcloud_implementation/docs/index.md +++ b/src/mcloud_implementation/docs/index.md @@ -7,13 +7,13 @@ This makes any structural data more discoverable, as structures and their properties will be returned alongside queries to other major [data providers](https://www.optimade.org/providers-dashboard/), and additionally enables future programmatic re-use of the data. -This approach has already found use in select cases where AiiDA graphs were +This approach has already found use in select cases where AiiDA databases were exported and stored by MCA, and subsequently exposed with OPTIMADE APIs, but now the functionality can be used on many common data types such as those understood by [ASE](https://wiki.fysik.dtu.dk/ase/) and [pymatgen](https://pymatgen.org). To enable this for an MCA submission, users must provide an additional config -file at the top-level of their submission, named `optimade.yml`. +file at the top-level of their submission, named `optimade.yaml`. The contents of this file will instruct the MCA data pipelines to ingest data from supported formats, then create and expose a queryable database. The full config file format, with examples, is described in the @@ -70,4 +70,4 @@ entries: which will enable database queries over these properties, and easier re-use by other scientists. -This full example, along with more complex examples, can be found on GitHub at [materialscloud-org/arcihve-optimade-integration](https://github.com/materialscloud-org/archive-optimade-integration/tree/main/src/mc_optimade/examples/folder_of_cifs). +This full example, along with more complex examples, can be found on GitHub at [materialscloud-org/archive-optimade-integration](https://github.com/materialscloud-org/archive-optimade-integration/tree/main/src/mc_optimade/examples/folder_of_cifs).