Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a first draft of the usage docs #40

Merged
merged 2 commits into from
Oct 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,15 @@ and hosted as [OPTIMADE APIs](https://optimade.org), enabling enhanced data disc
explorability.

This prototype repository contains two Python packages that work towards this
aim.
aim, as well as example scripts for deployment.

- `mc_optimade`: defines a config file format for annotating archives and registered the desired OPTIMADE entries, and a workflow for ingesting them and converting into OPTIMADE types using pre-existing parsers (e.g., ASE for structures). The archive is converted into an intermediate [OPTIMADE JSON Lines](https://github.com/Materials-Consortia/OPTIMADE/issues/471#issuecomment-1589274856) format that can be ingested into a database and used to serve a full OPTIMADE API.
- `optimade_launch`: provides a platform for launching an OPTIMADE API server
from such a JSON lines file. It does so using the
[`optimade-python-tools`](https://github.com/Materials-Consortia/optimade-python-tools/)
reference server implementation.
- `mcloud_implementation`: A set of tools and configuration used to deploy the
"archive watcher" and associated scrapers for MCA.

## Relevant links

Expand Down
73 changes: 73 additions & 0 deletions src/mcloud_implementation/docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Materials Cloud Archive OPTIMADE integration

Users can now specify that their Materials Cloud Archive (MCA) submissions be
hosted with an [OPTIMADE API](https://optimade.org), allowing structural data
(and otherwise) to be queried by OPTIMADE clients.
This makes any structural data more discoverable, as structures and their
properties will be returned alongside queries to other major [data
providers](https://www.optimade.org/providers-dashboard/), and additionally
enables future programmatic re-use of the data.
This approach has already found use in select cases where AiiDA databases were
exported and stored by MCA, and subsequently exposed with OPTIMADE APIs, but now
the functionality can be used on many common data types such as those understood
by [ASE](https://wiki.fysik.dtu.dk/ase/) and [pymatgen](https://pymatgen.org).

To enable this for an MCA submission, users must provide an additional config
file at the top-level of their submission, named `optimade.yaml`.
The contents of this file will instruct the MCA data pipelines to ingest data
from supported formats, then create and expose a queryable database.
The full config file format, with examples, is described in the
[MCA-OPTIMADE integration GitHub repository](https://github.com/materialscloud-org/archive-optimade-integration/).

## Example

As a simple illustration of the functionality, let's say a user is submitting a
.zip file containing Crystallographic Information Files (CIF) describing the
outputs of some calculations, with a simple `.csv` file describing computed
properties of those crystals.

In this case, the config file first has to describe where the structural
data can be found, e.g.,:

```yaml
entries:
- entry_type: structures
entry_paths:
- file: structures.zip
matches:
- structures/cifs/*.cif
```

Here, ASE will be used to parse the CIF files, and the
[optimade-python-tools](https://github.com/Materials-Consortia/optimade-python-tools)
library will be used to construct an OPTIMADE structure object.

The location of the computed properties can then be defined in a similar way (continuing the `entries->entry_type` block):

```yaml
entries:
- entry_type: structures
property_paths:
- file:
matches:
- data/data.csv
- data/data2.csv
```

Finally, definitions for the properties found in the `.csv` files can be
configured for enhanced sharing via OPTIMADE:

```yaml
entries:
- entry_type: structures
property_definitions:
- name: energy
title: Total energy per atom
description: The total energy per atom as computed by GGA-DFT.
unit: eV/atom
type: float
```

which will enable database queries over these properties, and easier re-use by other scientists.

This full example, along with more complex examples, can be found on GitHub at [materialscloud-org/archive-optimade-integration](https://github.com/materialscloud-org/archive-optimade-integration/tree/main/src/mc_optimade/examples/folder_of_cifs).