Skip to content

Commit

Permalink
docs: syncset docs (open-policy-agent#3202)
Browse files Browse the repository at this point in the history
Signed-off-by: Alex Pana <8968914+acpana@users.noreply.github.com>
Signed-off-by: alex <8968914+acpana@users.noreply.github.com>
Co-authored-by: Rita Zhang <rita.z.zhang@gmail.com>
Co-authored-by: Sertaç Özercan <852750+sozercan@users.noreply.github.com>
  • Loading branch information
3 people authored and leewoobin789 committed Apr 1, 2024
1 parent b5ae5cf commit a7035b8
Showing 1 changed file with 49 additions and 7 deletions.
56 changes: 49 additions & 7 deletions website/docs/sync.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,46 @@ id: sync
title: Replicating Data
---

`Feature State`: The `Config` resource is currently alpha.
## Replicating Data

> The "Config" resource must be named `config` for it to be reconciled by Gatekeeper. Gatekeeper will ignore the resource if you do not name it `config`.
Some constraints are impossible to write without access to more state than just the object under test. For example, it is impossible to know if a label is unique across all pods and namespaces unless a ConstraintTemplate has access to all other pods and namespaces. To enable this use case, we provide syncing of data into a data client.

### Replicating Data with SyncSets (Recommended)

`Feature State`: Gatekeeper version v3.15+ (alpha)

Kubernetes data can be replicated into the data client using `SyncSet` resources. Below is an example of a `SyncSet`:

```yaml
apiVersion: syncset.gatekeeper.sh/v1alpha1
kind: SyncSet
metadata:
name: syncset-1
spec:
gvks:
- group: ""
version: "v1"
kind: "Namespace"
- group: ""
version: "v1"
kind: "Pod"
```
The resources defined in the `gvks` field of a SyncSet will be eventually synced into the data client.

Some constraints are impossible to write without access to more state than just the object under test. For example, it is impossible to know if an ingress's hostname is unique among all ingresses unless a rule has access to all other ingresses. To make such rules possible, we enable syncing of data into OPA.
#### Working with SyncSet resources

The [audit](audit.md) feature does not require replication by default. However, when the ``audit-from-cache`` flag is set to true, the audit informer cache will be used as the source-of-truth for audit queries; thus, an object must first be cached before it can be audited for constraint violations.
* Updating a SyncSet's `gvks` field should dynamically update what objects are synced.
* Multiple `SyncSet`s may be defined and those will be reconciled by the Gatekeeper syncset-controller. Notably, the [set union](https://en.wikipedia.org/wiki/Union_(set_theory)) of all SyncSet resources' `gvks` and the [Config](sync#replicating-data-with-config) resource's `syncOnly` will be synced into the data client.
* A resource will continue to be present in the data client so long as a SyncSet or Config still specifies it under the `gvks` or `syncOnly` field.

Kubernetes data can be replicated into the audit cache via the sync config resource. Currently resources defined in `syncOnly` will be synced into OPA. Updating `syncOnly` should dynamically update what objects are synced. Below is an example:
### Replicating Data with Config

`Feature State`: Gatekeeper version v3.6+ (alpha)

> The "Config" resource must be named `config` for it to be reconciled by Gatekeeper. Gatekeeper will ignore the resource if you do not name it `config`.

Kubernetes data can also be replicated into the data client via the Config resource. Resources defined in `syncOnly` will be synced into OPA. Below is an example:

```yaml
apiVersion: config.gatekeeper.sh/v1alpha1
Expand All @@ -36,11 +67,22 @@ You can install this config with the following command:
kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/master/demo/basic/sync.yaml
```

Once data is synced into OPA, rules can access the cached data under the `data.inventory` document.
#### Working with Config resources

The `data.inventory` document has the following format:
* Updating a Config's `syncOnly` field should dynamically update what objects are synced.
* The `Config` resource is meant to be a singleton. The [set union](https://en.wikipedia.org/wiki/Union_(set_theory)) of all SyncSet resources' `gvks` and the [Config](sync#replicating-data-with-config) resource's `syncOnly` will be synced into the data client.
* A resource will continue to be present in the data client so long as a SyncSet or Config still specifies it under the `gvks` or `syncOnly` field.

### Accessing replicated data

Once data is synced, ConstraintTemplates can access the cached data under the `data.inventory` document.

The `data.inventory` document has the following format:
* For cluster-scoped objects: `data.inventory.cluster[<groupVersion>][<kind>][<name>]`
* Example referencing the Gatekeeper namespace: `data.inventory.cluster["v1"].Namespace["gatekeeper"]`
* For namespace-scoped objects: `data.inventory.namespace[<namespace>][groupVersion][<kind>][<name>]`
* Example referencing the Gatekeeper pod: `data.inventory.namespace["gatekeeper"]["v1"]["Pod"]["gatekeeper-controller-manager-d4c98b788-j7d92"]`

### Auditing From Cache

The [audit](audit.md) feature does not require replication by default. However, when the `audit-from-cache` flag is set to true, the audit informer cache will be used as the source-of-truth for audit queries; thus, an object must first be cached before it can be audited for constraint violations. Kubernetes data can be replicated into the audit cache via one of the resources above.

0 comments on commit a7035b8

Please sign in to comment.