Skip to content

Latest commit

 

History

History
767 lines (555 loc) · 29.4 KB

README.md

File metadata and controls

767 lines (555 loc) · 29.4 KB

CORDS Minimum Viable Data Space

CORDS MVDS has several components. These components can be downloaded as docker images for the deployment.

  • True Connector Components, open-source IDSA Connector designed by ENG. It is leveraged in CORDS MVE for ensuring IDSA complaint artifact exchange. It consist of several services including Execution Core Container (ECC) and Usage-Control (UC) Data Application.

  • CORDS Back End Data App, A modified version of the ENG Basic Data App to taylor to address the requirments of CORDS. It act as the midleware between the CORDS local and global digital threads.

  • Meta Data Broker, Implementation of an IDS Metadata Broker, which is a registry for IDS Connector self-description documents. It is currenly under development by the IDSA community.

  • CORDS Resource Manager, It is an API provided to manage artifiact from the ML Flow and genearate their semantic descriptions. Furthermore it can be used to push artifacts into the IDS connector ecosystem.

  • ML Semantic Library, Cords-Semantics is a Python library designed for tagging the artefacts of MLflow runs and generating IDSA-compatible semantic descriptions for artefacts using the ontology provided by CORDS.

  • MLFlow, Open source platform for the machine learning lifecycle. It can be used to manage the ML workflows and manage the artifacts geneated from every training rounds.

MVE

Index

The installation and configuration process is explained below for each of the components.

Hardware Requirements

In this section the minimum requirements required for operating the TRUEConnector-MVDS are detailed.

The current minimum requirements for the TRUEConnector-MVDS are:

  • 4 GB RAM (however 8GB RAM is recommended)
  • 50 GB storage

It is recommended to use 64bit quad core processor to provide enough processing power for all docker containers.

Take into account that if more components are included at the TRUEConnector-MVDS or a huge amount of data is uploaded it is possible to run out of disk free space. In this cases it is recommended to provide more free disk storage.

Running Preconfigured MVE

The CORDS MVE is already preconfigured out-of-the-box for testing purposes and you can start it by executing in the root folder where the docker-compose.yml is located:

docker-compose up -d

To see the log lines:

docker-compose logs -f

Exploring CORDS MVE

ML Flow

ML Flow UI can be access from you browser http://localhost:4000/. This can be used to track your assets in ML experiments. CORDS semantic library can be utilized to describe your ML assets using the CORDS ontology. This will ensure the semantic interoprability of ML assets that you will share on the data space. An example for connecting into the MLFLow and using semantic lib is given below.

#for setting up mlflow local server
mlflow.set_tracking_uri('http://localhost:4000')

mlflow.set_experiment("energy-prediction-regression-problem")

#Importing CORDS Tags
import cords_semantics.tags as cords_tags



with mlflow.start_run(run_name = run_name) as mlflow_run:
    mlflow_run_id = mlflow_run.info.run_id
    
    mlflow.set_experiment_tag("second best_model", "K-NeighborsRegressor")
    mlflow.set_tag("tag2", "K-NeighborsRegressor")
    mlflow.set_tag(cords_tags.CORDS_RUN, mlflow_run_id)
    mlflow.set_tag(cords_tags.CORDS_RUN_EXECUTES, "K-NeighborsRegressor")
    mlflow.set_tag(cords_tags.CORDS_IMPLEMENTATION, "python")
    mlflow.set_tag(cords_tags.CORDS_SOFTWARE, "sklearn")
    
    mlflow.sklearn.log_model(models['K-Neighbors Regressor'], "knnmodel")
        
    mlflow.log_metric("test_RMSE", rmse_scores.loc[rmse_scores['Model Name'] == 'K-Neighbors Regressor', 'RMSE_Score'].values[0])
    mlflow.log_metric("test_MAE", mae_scores.loc[mae_scores['Model Name'] == 'K-Neighbors Regressor', 'MAE_Score'].values[0])
    mlflow.log_metric("test_R2_Score", r2_scores.loc[r2_scores['Model Name'] == 'K-Neighbors Regressor', 'R2_Score'].values[0])   
    
    mlflow.log_input(dataset, context="training")
    
    
    print("MLFlow Run ID: ", mlflow_run_id)

CORDS Resource Manager

This API allows the user to prepare the assets to be shared on the Data Space using IDS connectors. It has a feature to extract metadata from MLFlow and convert to a IDS compatible resource description. Finally, when the contract negotation is done ML asset trasfer can be initiated from the API. Once this service is started the API documentation can be accessed from http://localhost:5000/docs. Use the Resource Manager Postman Collection to interact with the API.

1. Registering a new user for managing ML assets at the provider end.

curl -X POST http://localhost:5000/api/users/register \
-H "Content-Type: application/json" \
-H "User-Agent: PostmanRuntime/7.39.0" \
-H "Accept: */*" \
-H "Cache-Control: no-cache" \
-H "Host: localhost:5000" \
-H "Accept-Encoding: gzip, deflate, br" \
-d '{
    "email": "tharindu.prf@gmail.com",
    "password": "password123",
    "first_name": "Tharindu",
    "last_name": "Ranathunga",
    "role": "ML Engineer"
}'

2. Get auth token to authenticte the other API calls.

The base64 encorded user name password is added as the header here.

curl -X POST http://localhost:5000/api/users/get-auth-token \
  -H 'Authorization: Basic dGhhcmluZHUucHJmQGdtYWlsLmNvbTpwYXNzd29yZDEyMw==' \
  -H 'User-Agent: PostmanRuntime/7.39.0' \
  -H 'Accept: */*' \
  -H 'Cache-Control: no-cache' \
  -H 'Host: localhost:5000' \
  -H 'Accept-Encoding: gzip, deflate, br' \
  -H 'Connection: keep-alive' \
  -H 'Content-Length: 0'

Use the response token of this call to authenticate other calls.

3. Register a ML Model as an asset to be shared in the data space.

More details about the request body can be found in the API doc

curl -X POST http://localhost:5000/api/ml_models/add_model \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer LNUDU7YPEeYPcXPXqZPF8jDEiFrVnFmwLTbwYwzxrF8' \
  -H 'User-Agent: PostmanRuntime/7.39.0' \
  -H 'Accept: */*' \
  -H 'Cache-Control: no-cache' \
  -H 'Host: localhost:5000' \
  -H 'Accept-Encoding: gzip, deflate, br' \
  -H 'Connection: keep-alive' \
  -H 'Content-Length: 224' \
  -d '{
    "name": "Test Model 1",
    "version": "1.0",
    "description": "This model is a test model",
    "ml_flow_model_path": "mlflow-artifacts:/208444466607110357/2c8c2fc3fefe4c56a29ac325c0bac39c/artifacts/knnmodel"
}'

4. Register a data space connector for sharing assets

curl -X POST http://localhost:5000/api/dataspace_connector/add_connector \
  -H 'Content-Type: application/json' \
  -H 'Accept: */*' \
  -H 'Cache-Control: no-cache' \
  -H 'Accept-Encoding: gzip, deflate, br' \
  -H 'Connection: keep-alive' \
  -H 'Content-Length: 504' \
  -d '{
    "id": "https://w3id.org/engrd/connector/provider21",
    "name": "CORDS True Connector",
    "type": "ids:BaseConnector",
    "description": "Data Provider Connector description",
    "public_key": "TUlJREdqQ0NBcCtnQXdJQkFnSUJBVEFLQmdncWhrak9QUVFEQWpCTk1Rc3dDUVlEVlFRR0V3SkZVekVNTUFvR0ExVUVDZ3dEVTFGVE1SQXdEZ1lEVlFRTERBZFVaWE4wVEdGaU1SNHdIQVlEVlFRRERCVlNaV1psY21WdVkyVlVaWE4wWW1Wa1U=",
    "access_url": "https://89.19.88.88:8449/",
    "reverse_proxy_url": "https://localhost:8184/proxy"
}'

4. Creating a resource to be shared in data space.

This could be a ML model, Raw Data Dump or a Federated Learning Training (Refered by the asset_id). In this version only ML models are supported to be registered as a resource. This also link to the prefered data space connector that the resource will be shared.

curl -X POST http://localhost:5000/api/dataspace_resource/create_resource \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer LNUDU7YPEeYPcXPXqZPF8jDEiFrVnFmwLTbwYwzxrF8' \
  -H 'User-Agent: PostmanRuntime/7.39.0' \
  -H 'Accept: */*' \
  -H 'Cache-Control: no-cache' \
  -H 'Host: localhost:5000' \
  -H 'Accept-Encoding: gzip, deflate, br' \
  -H 'Connection: keep-alive' \
  -H 'Content-Length: 224' \
  -d '{
    "asset_id": "344f0e124bfb7363651bb080c3ca36f43a23094ab6566e1943f7592b7ff620e9",
    "connector_id": "https://w3id.org/engrd/connector/provider21",
    "resource_id": "1b2888f8c6032ee0223373cab9c62380f594e22435170b9cad8a62769d8810ea",
    "timestamp": "2024-05-31T13:37:55.708378",
    "type": "model"
}'

5. Register the resource on IDS Connector.

This will create the resource description using the IDS Information model. Moreover, metadata of the shared resource (Eg: ML semantics) are embeded to the resource description using the CORDS ontology.

curl -X POST http://localhost:5000/api/dataspace_connector/register_resource/1b2888f8c6032ee0223373cab9c62380f594e22435170b9cad8a62769d8810ea \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer LNUDU7YPEeYPcXPXqZPF8jDEiFrVnFmwLTbwYwzxrF8' \
  -H 'User-Agent: PostmanRuntime/7.39.0' \
  -H 'Accept: */*' \
  -H 'Cache-Control: no-cache' \
  -H 'Host: localhost:5000' \
  -H 'Accept-Encoding: gzip, deflate, br' \
  -H 'Connection: keep-alive' \
  -H 'Content-Length: 224' \
  -d '{
    "title": "Example IDS Resource1",
    "description": "This is an example IDS Resource",
    "keywords": ["cords", "energy prediction"],
    "catalog_id": "https://w3id.org/idsa/autogen/resourceCatalog/1ce75044-fd7d-4002-9117-051c7005f4ba"
}'

Data Space Connectors

Use the Rest API provided by the True Connector to interact with the IDS components. This can be used to perform the contract negotation and initiate the asset transfer. Once the contract negoting is done following API call at the consumer connector can be invoked to initate the model transfer. This is a True Connector specific API call.

curl -X POST https://localhost:8184/proxy \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Basic aWRzVXNlcjpwYXNzd29yZA==' \
  -H 'User-Agent: PostmanRuntime/7.39.0' \
  -H 'Accept: */*' \
  -H 'Cache-Control: no-cache' \
  -H 'Postman-Token: a22a80b8-f11a-4a0f-8037-258eb4cb3df5' \
  -H 'Host: localhost:8184' \
  -H 'Accept-Encoding: gzip, deflate, br' \
  -H 'Connection: keep-alive' \
  -H 'Content-Length: 463' \
  -d '{
    "multipart": "form",
    "Forward-To": "https://ecc-provider:8889/data",
    "messageType": "ArtifactRequestMessage",
    "requestedArtifact": "http://w3id.org/engrd/connector/artifact/8f840986c277d47f3535d47f0d3bdb1652cd988baa55df788deb253f20f2d974",
    "transferContract": "https://w3id.org/idsa/autogen/contractAgreement/65c27aa7-d334-4469-9ad0-e491f512a75b",
    "payload" : {
        "consumer_ip": "127.0.0.1",
        "consumer_port": "8765"
    }
}'