Framework for accessing data resources, mapping data models, describing the data to ontologies and perform data transformations
We highly recommend reading this page in the official documentation.
Important: As of v0.6.0, OTEAPI Core is no longer compatible with pydantic v1, but only with pydantic v2.
For more information about migrating your plugin repository to pydantic v2, see the pydantic documentation's migration guide.
Until the end of 2023, pydantic v1 will still be supported with security updates, but no new features will be added.
To keep using pydantic v1, one should use the v0.5.x versions of OTEAPI Core.
OTEAPI Core provides the core functionality of OTEAPI, which stands for the Open Translation Environment API.
It uses the strategy software design pattern to implement a simple and easy to extend access to a large range of data resources. Semantic interoperability is supported via mapping of data models describing the data to ontologies. A set of strategy interfaces that can be considered abstract classes for the implementation of strategies, and data models used in their configuration, are provided. This repo also contains implementations for several standard strategies, e.g., downloading files, parsing Excel documents. Transformations, mainly intended to transform data between representations, are also supported, but transformations can also be used for running simulations in a simple workflow.
OTEAPI Core includes:
- A set of standard strategies;
- A plugin system for loading the standard strategies, as well as third party strategies;
- Data models for configuring the strategies;
- A Python library, through which the data can be accessed; and
- An efficient data cache module that avoids downloading the same content several times.
Download strategy patterns use a given protocol to download content into the data cache.
They are configured with the ResourceConfig
data model, using the scheme of the downloadUrl
field for strategy selection.
The configuration
field can be used to configure how the downloaded content is stored in the cache using the DownloadConfig
data model.
Standard downloaded strategies: file, https, http, sftp, ftp
Parse strategy patterns convert content from the data cache to a Python dict.
Like download strategies, they are configured with the ResourceConfig
data model, using the mediaType
field for strategy selection.
Additional strategy-specific configurations can be provided via the configuration
field.
Standard parse strategies: application/json, image/jpg, image/jpeg, image/jp2, image/png, image/gif, image/tiff, image/eps, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, application/vnd.sqlite3
Resource strategy patterns can retrieve/upload data to external data services.
They are configured with the ResourceConfig
data model, using the scheme of the accessUrl
and accessService
fields.
The scheme of the accessUrl
is used for strategy selection.
Strategies for mapping fields/properties in data models to ontological concepts.
Filter strategies can update the configuration of other strategies. They can also update values in the data cache.
Standard filter strategies: filter/crop, filter/sql
Function strategies are synchronous transformations that (normally) run directly on the server hosting the OTE service.
Transformation strategies are a special form of a function strategy intended for long-running transformations. In this sense, they represent asynchronous functions running in the background or on external resources.
Standard transformation strategies: celery/remote
The transformation strategy has consolidated the execution of the
transformation with the get()
method to unify the strategy interfaces.
get()
is intended to start an asynchronous process and return a
task_id which can be queried using the status()
method (outside of a pipeline).
The way strategies are registered and found is through entry points.
Special group names allow understanding the strategy type and the entry point values allow understanding of what kind of strategy a specific class implements. A full overview of recognized entry point group names can be seen in Table of entry point strategies.
In the following examples, let's imagine we have a package importable in Python through my_plugin
and contains two download strategies and a single parse strategy:
- A peer-2-peer download strategy, implemented in a class named
Peer2PeerDownload
importable frommy_plugin.strategies.download.peer_2_peer
. - A MongoDB download strategy, implemented in a class named
MongoRetrieve
importable frommy_plugin.strategies.mongo
. - A MongoDB parse strategy, implemented in a class named
MongoParse
importable frommy_plugin.strategies.mongo
.
There are now various different ways to let the Python environment know of these strategies through entry points.
In the package's setup.py
file, one can specify entry points.
Here, an example snippet is shown using setuptools:
# setup.py
from setuptools import setup
setup(
# ...,
entry_points={
"oteapi.download": [
"my_plugin.p2p = my_plugin.strategies.download.peer_2_peer:Peer2PeerDownload",
"my_plugin.mongo = my_plugin.strategies.mongo:MongoRetrieve",
],
"oteapi.parse": [
"my_plugin.application/vnd.mongo+json = my_plugin.strategies.mongo:MongoParse",
]
},
)
Use custom files that are later parsed and used in a setup.py
file.
entry_points:
oteapi.download:
- "my_plugin.p2p = my_plugin.strategies.download.peer_2_peer:Peer2PeerDownload"
- "my_plugin.mongo = my_plugin.strategies.mongo:MongoRetrieve"
oteapi.parse:
- "my_plugin.application/vnd.mongo+json = my_plugin.strategies.mongo:MongoParse"
{
"entry_points": {
"oteapi.download": [
"my_plugin.p2p = my_plugin.strategies.download.peer_2_peer:Peer2PeerDownload",
"my_plugin.mongo = my_plugin.strategies.mongo:MongoRetrieve"
],
"oteapi.parse": [
"my_plugin.application/vnd.mongo+json = my_plugin.strategies.mongo:MongoParse"
]
}
}
A more modern approach is to use setup.cfg
or pyproject.toml
.
[options.entry_points]
oteapi.download =
my_plugin.p2p = my_plugin.strategies.download.peer_2_peer:Peer2PeerDownload
my_plugin.mongo = my_plugin.strategies.mongo:MongoRetrieve
oteapi.parse =
my_plugin.application/vnd.mongo+json = my_plugin.strategies.mongo:MongoParse
As seen above, there are a few different syntactical flavors of how to list the entry points. However, the "value" stays the same throughout.
The general syntax for entry points is based on ini
files and parsed using the built-in configparser
module described here.
Specifically for entry points the nomenclature is the following:
[options.entry_points]
GROUP =
NAME = VALUE
The VALUE
is then further split into: PACKAGE.MODULE:OBJECT.ATTRIBUTE [EXTRA1, EXTRA2]
.
From the general syntax outlined above, OTEAPI Core then implements rules and requirements regarding the syntax for strategies.
- A class MUST be specified (as an
OBJECT
). - The
NAME
MUST consist of exactly two parts:PACKAGE
and strategy type value in the form ofPACKAGE.STRATEGY_TYPE_VALUE
. - The
GROUP
MUST be a valid OTEAPI entry point group, see Table of entry point strategies for a full list of valid OTEAPI entry point group values.
To understand what the strategy type value should be, see Table of entry point strategies.
Strategy Type Name | Strategy Type Value | Entry Point Group | Documentation Reference |
---|---|---|---|
Download | scheme |
oteapi.download |
Download strategy |
Filter | filterType |
oteapi.filter |
Filter strategy |
Function | functionType |
oteapi.function |
Function strategy |
Mapping | mappingType |
oteapi.mapping |
Mapping strategy |
Parse | mediaType |
oteapi.parse |
Parse strategy |
Resource | accessService |
oteapi.resource |
Resource strategy |
Transformation | transformationType |
oteapi.transformation |
Transformation strategy |
- OTEAPI Services - a RESTful interface to OTEAPI Core
- OTELib - a Python interface to OTEAPI Services
- OTEAPI Plugin Template - a cookiecutter template for OTEAPI Plugins
OTEAPI Core can be installed with:
pip install oteapi-core
If you want to install OTEAPI Core to have a developer environment, please clone down the repository from GitHub and install:
git clone https://github.com/EMMC-ASBL/oteapi-core /path/to/oteapi-core
pip install -U --upgrade-strategy=eager -e /path/to/oteapi-core[dev]
Note, /path/to/oteapi-core
can be left out of the first line, but then it must be updated in the second line, either to ./oteapi-core
/oteapi-core
or .
if you cd
into the generated folder wherein the repository has been cloned.
The --upgrade-strategy=eager
part can be left out.
We recommend installing within a dedicated virtual environment.
To test the installation, you can run:
cd /path/to/oteapi-core
pytest
If you run into issues at this stage, please open an issue.
Docker is an effective tool for creating portable, isolated environments for your applications. Here's an example of setting up a PostgreSQL instance using Docker:
-
Create a Docker volume: Docker volumes enable data to persist across uses of Docker containers. In this context, we create a volume called pgdata to store database data.
docker volume create pgdata
-
Start a Docker container: Use the
docker run
command to initiate a new Docker container using the postgres image. Here's a breakdown of the options used in the command:-d
: Runs the container in the background (detached mode), freeing up your terminal.--name postgres
: Names the container postgres, allowing it to be referenced in future Docker commands.-e POSTGRES_PASSWORD=postgres
: Sets an environment variable in the container to specify the PostgreSQL database password as postgres.-p 5432:5432
: Maps port 5432 of the container to port 5432 of the host machine, letting applications on the host connect to the PostgreSQL database in the container.-v pgdata:/var/lib/postgresql/data
: Mounts the pgdata volume at the path /var/lib/postgresql/data inside the container, which is the storage location for PostgreSQL data files.--restart always
: Ensures the container restarts whenever it stops, unless it is manually stopped, in which case it only restarts when the Docker daemon starts, usually on system boot.docker run -d --name postgres \ -e POSTGRES_PASSWORD=postgres \ -p 5432:5432 \ -v pgdata:/var/lib/postgresql/data \ --restart always postgres
OTEAPI Core is released under the MIT license with copyright © SINTEF.
OTEAPI Core has been supported by the following projects:
-
OntoTrans (2020-2024) that receives funding from the European Union’s Horizon 2020 Research and Innovation Programme, under Grant Agreement no. 862136.
-
VIPCOAT (2021-2025) receives funding from the European Union’s Horizon 2020 Research and Innovation Programme - DT-NMBP-11-2020 Open Innovation Platform for Materials Modelling, under Grant Agreement no: 952903.
-
OpenModel (2021-2025) receives funding from the European Union’s Horizon 2020 Research and Innovation Programme - DT-NMBP-11-2020 Open Innovation Platform for Materials Modelling, under Grant Agreement no: 953167.
-
MatCHMaker (2022-2026) receives funding from the European Union’s Horizon Europe Research and Innovation Programme, under Grant Agreement no: 101091687.