Skip to content

Commit

Permalink
Add a first draft of the usage docs (#40)
Browse files Browse the repository at this point in the history
* Add a first draft of the docs

* tiny changes

---------

Co-authored-by: Kristjan Eimre <kristjaneimre@gmail.com>
  • Loading branch information
ml-evs and eimrek authored Oct 3, 2023
1 parent 3b261d8 commit f26ac86
Show file tree
Hide file tree
Showing 2 changed files with 76 additions and 1 deletion.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,15 @@ and hosted as [OPTIMADE APIs](https://optimade.org), enabling enhanced data disc
explorability.

This prototype repository contains two Python packages that work towards this
aim.
aim, as well as example scripts for deployment.

- `mc_optimade`: defines a config file format for annotating archives and registered the desired OPTIMADE entries, and a workflow for ingesting them and converting into OPTIMADE types using pre-existing parsers (e.g., ASE for structures). The archive is converted into an intermediate [OPTIMADE JSON Lines](https://github.com/Materials-Consortia/OPTIMADE/issues/471#issuecomment-1589274856) format that can be ingested into a database and used to serve a full OPTIMADE API.
- `optimade_launch`: provides a platform for launching an OPTIMADE API server
from such a JSON lines file. It does so using the
[`optimade-python-tools`](https://github.com/Materials-Consortia/optimade-python-tools/)
reference server implementation.
- `mcloud_implementation`: A set of tools and configuration used to deploy the
"archive watcher" and associated scrapers for MCA.

## Relevant links

Expand Down
73 changes: 73 additions & 0 deletions src/mcloud_implementation/docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Materials Cloud Archive OPTIMADE integration

Users can now specify that their Materials Cloud Archive (MCA) submissions be
hosted with an [OPTIMADE API](https://optimade.org), allowing structural data
(and otherwise) to be queried by OPTIMADE clients.
This makes any structural data more discoverable, as structures and their
properties will be returned alongside queries to other major [data
providers](https://www.optimade.org/providers-dashboard/), and additionally
enables future programmatic re-use of the data.
This approach has already found use in select cases where AiiDA databases were
exported and stored by MCA, and subsequently exposed with OPTIMADE APIs, but now
the functionality can be used on many common data types such as those understood
by [ASE](https://wiki.fysik.dtu.dk/ase/) and [pymatgen](https://pymatgen.org).

To enable this for an MCA submission, users must provide an additional config
file at the top-level of their submission, named `optimade.yaml`.
The contents of this file will instruct the MCA data pipelines to ingest data
from supported formats, then create and expose a queryable database.
The full config file format, with examples, is described in the
[MCA-OPTIMADE integration GitHub repository](https://github.com/materialscloud-org/archive-optimade-integration/).

## Example

As a simple illustration of the functionality, let's say a user is submitting a
.zip file containing Crystallographic Information Files (CIF) describing the
outputs of some calculations, with a simple `.csv` file describing computed
properties of those crystals.

In this case, the config file first has to describe where the structural
data can be found, e.g.,:

```yaml
entries:
- entry_type: structures
entry_paths:
- file: structures.zip
matches:
- structures/cifs/*.cif
```
Here, ASE will be used to parse the CIF files, and the
[optimade-python-tools](https://github.com/Materials-Consortia/optimade-python-tools)
library will be used to construct an OPTIMADE structure object.
The location of the computed properties can then be defined in a similar way (continuing the `entries->entry_type` block):

```yaml
entries:
- entry_type: structures
property_paths:
- file:
matches:
- data/data.csv
- data/data2.csv
```

Finally, definitions for the properties found in the `.csv` files can be
configured for enhanced sharing via OPTIMADE:

```yaml
entries:
- entry_type: structures
property_definitions:
- name: energy
title: Total energy per atom
description: The total energy per atom as computed by GGA-DFT.
unit: eV/atom
type: float
```

which will enable database queries over these properties, and easier re-use by other scientists.

This full example, along with more complex examples, can be found on GitHub at [materialscloud-org/archive-optimade-integration](https://github.com/materialscloud-org/archive-optimade-integration/tree/main/src/mc_optimade/examples/folder_of_cifs).

0 comments on commit f26ac86

Please sign in to comment.