Discuss ideas for workload mapping #657

lukehinds · 2023-08-10T10:46:46Z

lukehinds
Aug 10, 2023
Maintainer

I would like to use this thread for ways in which we can map workloads to mediator.

What is workload mapping?

Workload mapping is building a provenance trail from source code to a running workload. We can then understand the software supply chain from a change (Luke made PR 234, which introduced a new package called 'widgets') to a running workload (change introduced in pull request 234, is instantiated in workloads $UUID-1 and $UUID-2

We should consider any environment or provider as a possible source of workload mapping.

Over to you!

lukehinds · 2023-08-10T10:50:21Z

lukehinds
Aug 10, 2023
Maintainer Author

AWS Workload Mapping

📓 I wrote this up a while ago, I will revise as I see some questionable aspects already, but throwing up here for us to kick the tires over.

Goal:

Provide a means to map a workload to its source of origin (code), with strong cryptographic guarantees, preferably agentless, using AWS services and driven by mediator.

Persona:

I am an IT operations team, we use AWS.. I want trusted insight into the provenance of software developed by our engineering teams. I want to know who wrote the code, where did it come from (repository), was the repository secure, and what were the immediate and transitive dependencies of the code. I also want to know where, when and by whom the code was deployed.

I want this in a single location / pane of glass. If a vulnerability is found in my engineers code, I want to remediate precisely where the code is deployed.

Source of origin provenance.

An asset inventory of the code is generated (SLSA style attestation) at release time either by the code hosting providers CI (GitHub) or within mediator. The bare essentials of this are the dependencies used, the digests of any compiled binary, or in the case of an interpreted language, the bundled format file(s). Alongside the standard SBOM style information are details gathered from the code repository such as the developer(s) ID who contributed the code, what was the commit sha, etc. This is all standard stuff we can already do + biometrics fun etc!

Mediator Lambda Function

The MLF’s role is to generate provenance of the code at the application creation time. You can consider it a replication of the previous provenance step at code submit time and described in the section source of origin provenance, replayed at the deployment end. In addition to the capture of source provenance, the S3 bucket unique identifier is also written to the metadata by the MLF. This is all then rounded up within a file: provenance_$s3_id.json. Before this though, some setup happens (all automated).

Setup steps (driven by mediator posting to AWS APIs).

An MLF (Mediator Lambda Function) is pushed via the AWS API to the relevant AWS ARN by a mediator user account, using access granted from OpenID Connect Federation between AWS and Mediator. The MLF contains no customer code or functionality.

The AWS Key Management Service is called via a mediator API to create a cryptographic key pair. The private key remains in a secure AWS vault, the public key is retrieved to store within mediator (required for later verification). It is written as public_$keyid_$arn.pem.

An AWS EventBridge is created by mediator, that is required to notify the MLF on events such as an S3 bucket has been created.

The MLF has authorized access to the private key object within AWS KMS to perform signing operations.

Upon receiving an event (bucket deployed with objects) via the EventBridge bus, the MLF is triggered and generates a provenance file provenance_$s3_id.json. The provenance file is signed with a KMS resident private key.

The MLF generated provenance file, now a signed payload is POST’ed to mediators workload mapping API.

Mediator now possesses a payload containing an s3 bucket ID, ARN and SBOM style metadata with a cryptographic signature to know that mediator provisioned AWS KMS keys were used to generate the signature. If we get sent junk, we will know as a public key will not be in our possession to perform a cryptographic verification.

We can now store those dependencies into a database / graph, and should a single dependency match a vulnerability within https://osv.dev/ (or dependabot) we can inform the user who can then remediate and fix within AWS. This whole flow is also possible for container images (ECS) and EC2 instances. Likewise is a code repository is hacked or a developer sacked for misconduct, we know what workloads may be compromised.

Belt and Braces Approach (deploy from mediator).

It is of course possible that mediator could deploy the S3 bucket as well. GitHub sends a release event to Mediators GitHub App webhook, we generate provenance, create AWS keys, setup EventBridge, all as a one shot event and deploy the bucket using the AWS API. We can then strip down the infra, short of the running S3 bucket, as we will have the provenance + s3 bucket ID stored within mediator.

👀 @evankanderson @eryn-muetzel @yrobla @dussab @jhrozek @JAORMX @craigmcl

2 replies

evankanderson Aug 10, 2023
Maintainer

I'm assuming here that we're using S3 as storage for provenance data in the same way that OCI registries use references from the 1.1 spec to associate provenance information with images?

Our job is slightly easier here because S3 allows considerably more flexibility in object naming, and we already have a mechanism to detect provenance corruption (signatures in Rekor), so we don't need to build a referencing index.

yrobla Aug 11, 2023

I am having trouble understanding how this provenance information is matched to a running workload. Users should call the lambda function as part of their deployment operations? How are we seeing that, as part of a CD pipeline?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discuss ideas for workload mapping #657

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Discuss ideas for workload mapping #657

lukehinds Aug 10, 2023 Maintainer

Replies: 1 comment · 2 replies

lukehinds Aug 10, 2023 Maintainer Author

AWS Workload Mapping

Goal:

Persona:

Source of origin provenance.

Mediator Lambda Function

Belt and Braces Approach (deploy from mediator).

evankanderson Aug 10, 2023 Maintainer

yrobla Aug 11, 2023

lukehinds
Aug 10, 2023
Maintainer

Replies: 1 comment 2 replies

lukehinds
Aug 10, 2023
Maintainer Author

evankanderson Aug 10, 2023
Maintainer