Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definition of submodelDescriptor to support parquet files #942

Open
3 tasks
thomas-henn opened this issue Jun 21, 2024 · 7 comments
Open
3 tasks

Definition of submodelDescriptor to support parquet files #942

thomas-henn opened this issue Jun 21, 2024 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@thomas-henn
Copy link

thomas-henn commented Jun 21, 2024

Description

The description of a submodel descriptior in an Asset Administration Shell should be given, with respect to
"interface" (https://admin-shell-io.github.io/aas-specs-antora/IDTA-01002/v3.1/specification/interfaces-payload.html#_endpoint),
"semanticId"
for the use of parquet files.

Acceptance Criteria

  • [criteria 1]
  • [criteria 2]
  • [criteria 3]

Additional Information

Child of Feature: eclipse-tractusx/sig-release#721
linked to: eclipse-tractusx/sldt-semantic-models#762
Possible solutions:definition_submodeldescriptors_parquet.md

@arnoweiss
Copy link
Contributor

This might work similarly to the PCF use-case where the submodelDescriptor.endpoint.interface property is used as discriminator how to access the "submodel". Afaik, parquet is a compression technology so there'd have to be a very precise spec

  1. how to obtain the file (http/s3/?) and how to parameterize that call
  2. how to decompress it
  3. how to parse the plaintext payload into a domain-model (like a SAMM aspect)

I'm interested in contributing here.

@tunacicek
Copy link
Contributor

tunacicek commented Jul 17, 2024

@BirgitBoss : Thanks for your input.
Four possible solutions (which need to be evaluated) to deliver parquet files:

  1. Define the parquet file in aspect as BLOB-type and add the the payload(This could be very large)
  2. Define the parquet file in aspect as File-type and add the path to the parquet file where the requester can download it. (Two steps needed to get file downloaded)
  3. Define only the meta information like the link to the FTP Server etc. in aspect
  4. Use API [/submodel/submodel-elements/{idShortPath}/attachment which delivers zip

@arnoweiss
Copy link
Contributor

Is there an explicit requirement

  • to have the submodelDescriptor point to a S3/BlobStorage resource?
  • to have (bidirectional) transformation rules from a nestable format (like SAMM) to a tabular format?

I'm inclined to reuse as much of the href/subprotocolBody mechanism from the SUBMODEL-3.0 interface as possible assuming that access will always be negotiated via DSP catalogs.

@tunacicek tunacicek moved this from Todo to In Progress in 🚀SLDT Board Jul 25, 2024
@tunacicek
Copy link
Contributor

tunacicek commented Aug 1, 2024

Hi @arnoweiss ,

  1. there is no explicit requirement to use File transfer via Bucket. But for larger files, it make sense to use the transfer.
  2. We assume to create a new model which maps the flatten hierarchy to represent the columns in the parquet file. (Rules like using "_" or "."). Therefore we have the second story:
    [New Model]: Asepct Model to Handle Parquet Files sldt-semantic-models#762

I added a md file in the description (Possible solutions) which includes two solutions on how to define the SubModelDescriptors.

@tunacicek
Copy link
Contributor

tunacicek commented Aug 1, 2024

See also md file:
concept-parquetfile-shell.md

Solution 1: File transfer via AAX File

If the file size is less than 100 GB, it can be transferred via HTTPS in AAX format. Therefore, the Parquet file must be included within an AAX file. The submodelDescriptor for this case can be defined as follows:

{
   ...
   "submodelDescriptors": [
      {
         "idShort": "quality-data",
         "id": "<uuid>",
         "endpoints": [
            {
               "interface": "AASX-FILE-3.0",
               "protocolInformation": {
                  "href": "https://<provider-edc-dataplane-url>/<path>/package/<base64url-encoded unique ID of submodel>",
                  "endpointProtocol": "HTTP",
                  "endpointProtocolVersion": [
                     "1.1"
                  ],
                  "subprotocol": "",
                  "subprotocolBody": "id=<edc-asset-id>;dspEndpoint=<controlplane-url>",
                  "subprotocolBodyEncoding": "application/octet-stream",
                  "securityAttributes": [
                     {
                        "type": "NONE",
                        "key": "NONE",
                        "value": "NONE"
                     }
                  ]
               }
            }
         ],
         "semanticId": {
            "type": "ExternalReference",
            "keys": [
               {
                  "type": "GlobalReference",
                  //TBD: New Model which map flatten hirarchy to model (like with "_" or ".")
                  "value": "urn:samm:io.catenax.vehicle.product_description:3.0.0#FlattenedProductDescription"
               }
            ]
         },
         "description": [
            {
               "language": "en",
               "text": "submodel-descriptor for quality data which will be transferred via AAX File."
            }
         ]
      }
   ]
}

AASX File example

Example AASX File structure can looks like:

qualitydata.aasx
│   README.md  
│───aasx
      └───boschqualitydata
      │   │ boschqualitydata.xml
      │   
      └───Document
          │   CX_release3_qax_testdata_qualitytask_v1.1.parquet

The reference inside 'boschqualitydata.xml' can look like:

<?xml version="1.0"?>
<aas:aasenv xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:IEC61360="http://www.admin-shell.io/IEC61360/1/0" xsi:schemaLocation="http://www.admin-shell.io/aas/1/0 AAS.xsd http://www.admin-shell.io/IEC61360/1/0 IEC61360.xsd" xmlns:aas="http://www.admin-shell.io/aas/1/0">
  <aas:assetAdministrationShells>
  </aas:assetAdministrationShells>
  <aas:assets>
    <aas:asset>
      <aas:idShort>quailty-data</aas:idShort>
      <aas:identification idType="URI">http://example.de</aas:identification>
      <aas:kind>Instance</aas:kind>
    </aas:asset>
  </aas:assets>
  <aas:submodels>
    <aas:submodel>
      <aas:idShort>Document</aas:idShort>
      <aas:identification idType="URI">http://bosch-quality-data.com/shells/R055732577/1012160102010001/submodels/document/</aas:identification>
      <aas:semanticId>
        <aas:keys>
          <aas:key type="GlobalReference" local="false" idType="value">urn:samm:io.catenax.vehicle.product_description:3.0.0#MetaInformation</aas:key>
        </aas:keys>
      </aas:semanticId>
      <aas:kind>Instance</aas:kind>
      <aas:qualifier />
      <aas:submodelElements>
        <aas:submodelElement>
          <aas:submodelElementCollection>
            <aas:idShort>Quality_data</aas:idShort>
            <aas:category>PARAMETER</aas:category>
            <aas:semanticId>
              <aas:keys>
                <aas:key type="GlobalReference" local="false" idType="value">urn:samm:io.catenax.vehicle.product_description:3.0.0#MetaInformation</aas:key>
              </aas:keys>
            </aas:semanticId>
            <aas:kind>Instance</aas:kind>
            <aas:qualifier />
            <aas:value>
              <aas:submodelElement>
                <aas:file>
                  <aas:idShort>File</aas:idShort>
                  <aas:category>PARAMETER</aas:category>
                  <aas:semanticId>
                    <aas:keys>
                      <aas:key type="ConceptDescription" local="true" idType="IRDI">0173-1#02-AAD005#008</aas:key>
                    </aas:keys>
                  </aas:semanticId>
                  <aas:kind>Instance</aas:kind>
                  <aas:qualifier />
                  <aas:mimeType>application/parquet</aas:mimeType>
                  <aas:value>/aasx/Document/CX_release3_qax_testdata_qualitytask_v1.1.parquet</aas:value>
                </aas:file>
              </aas:submodelElement>
            </aas:value>
            <aas:ordered>false</aas:ordered>
            <aas:allowDuplicates>false</aas:allowDuplicates>
          </aas:submodelElementCollection>
        </aas:submodelElement>
      </aas:submodelElements>
    </aas:submodel>
  </aas:submodels>
</aas:aasenv>

Solution 2: File transfer via async. API AAX File

Issue is created: admin-shell-io/aas-specs-api#347
Status: In progress

Solution 3: File transfer via HTTPS

If the file size is not too large, the file can be transferred via HTTPS in the usual way. The submodelDescriptor for this case can be defined like:

{
   ...
   "submodelDescriptors": [
      {
         "idShort": "quality-data",
         "id": "<uuid>",
         "endpoints": [
            {
               "interface": "SUBMODEL-ATTACHMENT-3.0",
               "protocolInformation": {
                  "href": "<provider-edc-dataplane-url>/<path-to-download-file>",
                  "endpointProtocol": "HTTP",
                  "endpointProtocolVersion": [
                     "1.1"
                  ],
                  "subprotocol": "",
                  "subprotocolBody": "id=<edc-asset-id>;dspEndpoint=<controlplane-url>",
                  "subprotocolBodyEncoding": "application/octet-stream",
                  "securityAttributes": [
                     {
                        "type": "NONE",
                        "key": "NONE",
                        "value": "NONE"
                     }
                  ]
               }
            }
         ],
         "semanticId": {
            "type": "ExternalReference",
            "keys": [
               {
                  "type": "GlobalReference",
                  //TBD: New Model which map flatten hirarchy to model (like with "_" or ".")
                  "value": "urn:samm:io.catenax.vehicle.product_description:3.0.0#FlattenedProductDescription"
               }
            ]
         },
         "description": [
            {
               "language": "en",
               "text": "submodel-descriptor for quality data which will be transferred via HTTPS."
            }
         ]
      }
   ]
}

Solution 4: File transfer via S3 bucket with EDC AWS extension

Introduction to File Transfer process in EDC

  1. Provider uploads the parquet file to an S3 bucket.
  2. Provider creates an edc-asset. The dataAddress includes information about the S3 bucket and the file name:
    {
      "edc:dataAddress": {
        "edc-type": "AmazonS3",
        "edc:bucket": "<provider-bucket>",
        "edc:keyName": "<file-name>"
      }
    }
    See also full example here: edc-asset for file transfer
  3. Provider creates a shell with shell descriptors and the edc Asset with controplane URL.
  4. Consumer calls the DTR via EDC and read the submodeldescriptor from the shell.
  5. Consumer starts negotiation/transfer process and provide their own bucket information:
    {
      "edc:dataDestination": {
        "edc-type": "AmazonS3",
        "edc:bucket": "<consumer-bucket>",
        "edc:keyName": "<file-name>",
        "edc:accessKeyId":"<access-key-id>",
        "edc:secretAccessKey":"<secretAccessKey>"
      }
    }
  6. Provider retrieve the request and download the requested file from bucket and upload it to the consumer bucket.
  7. Consumer can download the file from their bucket.

SubmodelDescriptor for Parquet Data (via File Transfer)

For the file transfer via S3 bucket the submodelDescriptor can be defined like:

{
...
  "submodelDescriptors": [
    {
      "idShort": "quality-data",
      "id": "<uuid>",
      "endpoints": [
        {
           // TODO: Check which interface can be used here. See :https://github.com/admin-shell-io/questions-and-answers?tab=readme-ov-file#id47
          "interface": "TBD", // SUBMODEL-3.0 cam not used because in EDC AWS S3 the consumer only triggers the event. It is a asyn call. No other type which can be used here (from IDTA Standards)
          "protocolInformation": {
           //TODO: href is required. Check if insert NONE is possible.
            "href": "not used for transfer via s3 bucket",
            "endpointProtocol": "HTTP",
            "endpointProtocolVersion": [
              "1.1"
            ],
            "subprotocol": "",
            "subprotocolBody": "id=<edc-asset-id>;dspEndpoint=<controlplane-url>",
             // Since the action only triggers an asyn event, the encoding is not relevant.
            "subprotocolBodyEncoding": "TBD", 
            "securityAttributes": [
              {
                "type": "NONE",
                "key": "NONE",
                "value": "NONE"
              }
            ]
          }
        }
      ],
      "semanticId": {
        "type": "ExternalReference",
        "keys": [
          {
            "type": "GlobalReference",
            "value": "urn:samm:io.catenax.vehicle.product_description:3.0.0#ProductDescription"
          }, 
           {
              "type": "GlobalReference",
              //TBD: New Model which map flatten hirarchy to model (like with "_" or "."). Outcome of the issue: https://github.com/eclipse-tractusx/sldt-semantic-models/issues/762
              "value": "urn:samm:io.catenax.vehicle.product_description:3.0.0#FlattenedProductDescription"
           }
        ]
      },
      "description": [
        {
          "language": "en",
          "text": "submodel-descriptor for quality data which will be transferred via S3 bucket."
        }
      ]
    }
  ]
}
Parameter Value Description
href "" not used for transfer via s3 bucket
subprotocolBody id=;dspEndpoint= Includes information about provider edc controlplane and edc-asset ID
subprotocolBodyEncoding application/octet-stream;type=parquet-snappy format of file. The file will be transfer via bucket.
semanticId.keys[].value urn:samm:io.catenax.vehicle.product_description:3.0.0#FlattenedProductDescription Aspect model which map flatten hirarchy to model
description[].text submodel-descriptor for quality data which will be transferred via S3 bucket. Further description for the consumer.

Solution 5: File transfer via S3 bucket without EDC AWS extension

Introduction to File Transfer process in EDC

  1. Provider uploads the parquet file to an S3 bucket.
  2. Provider creates a shell with shell descriptors and the edc Asset, controlplane URL, S3 Bucket Link and credentials to access s3 bucket of provider.
  3. Consumer calls the DTR via EDC and read the submodeldescriptor from the shell.
  4. Consumer reads the credentials and starts donwloading the file from provider s3 bucket.

SubmodelDescriptor for Parquet Data (via File Transfer)

For the file transfer via S3 bucket the submodelDescriptor can be defined like:

{
...
  "submodelDescriptors": [
    {
      "idShort": "quality-data",
      "id": "<uuid>",
      "endpoints": [
        {
           // TODO: Check which interface can be used here. See :https://github.com/admin-shell-io/questions-and-answers?tab=readme-ov-file#id47
          "interface": "SUBMODEL-ATTACHMENT-3.0",
          "protocolInformation": {
           //Link to Provider AWS S3 Bucket (S3 URI)
            "href": "https://example-bucket.s3.us-east-2.amazonaws.com/productDescription.parquet",
            "endpointProtocol": "HTTP",
            "endpointProtocolVersion": [
              "1.1"
            ],
            "subprotocol": "",
            "subprotocolBody": "id=<edc-asset-id>;dspEndpoint=<controlplane-url>",
            "subprotocolBodyEncoding": "application/octet-stream",
             // TODO: Check credentials for S3 bucket can be used here
            "securityAttributes": [
              {
                // TODO: Clarify which enum can be used
                "type": "NONE",
                // S3 securityKey
                "key": "SecurityKey",
                // S3 accessKey
                "value": "AccessKey"
              }
            ]
          }
        }
      ],
      "semanticId": {
        "type": "ExternalReference",
        "keys": [
          {
            "type": "GlobalReference",
            "value": "urn:samm:io.catenax.vehicle.product_description:3.0.0#ProductDescription"
          }, 
           {
              "type": "GlobalReference",
              //TBD: New Model which map flatten hirarchy to model (like with "_" or "."). Outcome of the issue: https://github.com/eclipse-tractusx/sldt-semantic-models/issues/762
              "value": "urn:samm:io.catenax.vehicle.product_description:3.0.0#FlattenedProductDescription"
           }
        ]
      },
      "description": [
        {
          "language": "en",
          "text": "submodel-descriptor for quality data which will be transferred via S3 bucket."
        }
      ]
    }
  ]
}

Solution 6: File transfer via two-step process

  1. Provider uploads the parquet file to an S3 bucket.
  2. Provider creates an edc-asset. The dataAddress includes information about the S3 bucket and the file name:
{
  "edc:dataAddress": {
    "edc-type": "AmazonS3",
    "edc:bucket": "<provider-bucket>",
    "edc:keyName": "<file-name>"
  }
}
  1. Provider creates a simple submodel/aspect. The new aspect model will contain the reference/MetaInformation to the parquet file as property (i.e. controlplane url, edc-assetId). It contains also the semantics of the parquet file.
{
   "fileType":"parquet",
   "semanticModel":"urn:samm:io.catenax.vehicle.product_description:3.0.0#FlattenedProductDescription",
   "transferObject":{
      "transferType":"EDC AWS S3 Bucket",
      "href":"NONE",
      "subprotocolBody":"id=edc-asset-1;dspEndpoint:controplaneUrl",
      "description":"File will be uploaded via EDC AWS Extention to consumer S3 Bucket."
   }
}
  1. Provider registers the submodel URL from step 3 in the edc as edc-asset.
  2. Provider creates a shell with shell descriptors and the edc Asset from step 4 with controplane URL.
{
   ...
   "submodelDescriptors": [
      {
         "idShort": "quality-data",
         "id": "<uuid>",
         "endpoints": [
            {
               "interface": "SUBMODEL-3.0",
               "protocolInformation": {
                  "href": "<provider-edc-dataplane-url>/<path-to-submodel-service/$value",
                  "endpointProtocol": "HTTP",
                  "endpointProtocolVersion": [
                     "1.1"
                  ],
                  "subprotocol": "",
                  "subprotocolBody": "id=<edc-asset-id>;dspEndpoint=<controlplane-url>",
                  "subprotocolBodyEncoding": "application/json; charset=UTF-8",
                  "securityAttributes": [
                     {
                        "type": "NONE",
                        "key": "NONE",
                        "value": "NONE"
                     }
                  ]
               }
            }
         ],
         "semanticId": {
            "type": "ExternalReference",
            "keys": [
               {
                  "type": "GlobalReference",
                  // Semantic model for metainformation of parquet file (location of parquet file, etc.)
                  "value": "urn:samm:io.catenax.vehicle.product_description:3.0.0#MetaInformation"
               }
            ]
         },
         "description": [
            {
               "language": "en",
               "text": "submodel-descriptor for quality data MetaInformation. This submodel will return the location of the parquet file."
            }
         ]
      }
   ]
}
  1. Consumer calls the DTR via EDC and read the submodeldescriptor from the shell.
  2. Consumer starts negotiation/transfer process with the provider edc to get data which is behind the edc asset from step 4.
  3. Consumer calls the submodel from step 4 through the edc. The response will contain information about the location of the parquet file. (edc-asset Id and controlplane etc.)
  4. Consumer starts negotation/transfer process with the provider edc to get the parquet file.

@tunacicek tunacicek moved this from In Progress to In Review in 🚀SLDT Board Aug 5, 2024
@agg3fe
Copy link

agg3fe commented Sep 13, 2024

As discussed with Birgit, the submodel interface value 'productDescription' is not according to IDTA standards. We need to use interface value as 'SUBMODEL-3.0'.
We need to create a simple submodel/aspect, like we are using before. That new aspect model will contain the reference to the Parquet file as a property. It will also contain the semantics of that Parquet file and the conversion fields/values required.

Need to check the mechanism that is used currently through EDC to exchange the consumer Bucket credentials. Through this new defined submodel, consumer won't be able to provide the bucket credentials. So need to find out a way to make async call to provider and get the response for Parquet file location.

Example of submodel is attached.ProductDesriptionAsParquetFile.txt

@tunacicek
Copy link
Contributor

Issue for asynch. call was created:
admin-shell-io/aas-specs-api#347

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In Progress
Development

No branches or pull requests

4 participants