Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: syncset docs #3202

Merged
merged 9 commits into from
Jan 15, 2024
Merged
48 changes: 41 additions & 7 deletions website/docs/sync.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,38 @@ id: sync
title: Replicating Data
---

`Feature State`: The `Config` resource is currently alpha.
## Replicating Data

> The "Config" resource must be named `config` for it to be reconciled by Gatekeeper. Gatekeeper will ignore the resource if you do not name it `config`.
Some constraints are impossible to write without access to more state than just the object under test. For example, it is impossible to know if an ingress's hostname is unique among all ingresses unless a ConstraintTemplate has access to all other ingresses. To enable this use case, we provide syncing of data into a data client.

Some constraints are impossible to write without access to more state than just the object under test. For example, it is impossible to know if an ingress's hostname is unique among all ingresses unless a rule has access to all other ingresses. To make such rules possible, we enable syncing of data into OPA.
### Replicating Data with SyncSets (Recommended)

The [audit](audit.md) feature does not require replication by default. However, when the ``audit-from-cache`` flag is set to true, the audit informer cache will be used as the source-of-truth for audit queries; thus, an object must first be cached before it can be audited for constraint violations.
`Feature State`: Gatekeeper version v3.15+ (alpha)

Kubernetes data can be replicated into the audit cache via the sync config resource. Currently resources defined in `syncOnly` will be synced into OPA. Updating `syncOnly` should dynamically update what objects are synced. Below is an example:
Kubernetes data can be replicated into the data client using `SyncSet` resources. Below is an example of a `SyncSet`:

```yaml
apiVersion: syncset.gatekeeper.sh/v1alpha1
kind: SyncSet
metadata:
name: syncset-1
namespace: "gatekeeper-system"
acpana marked this conversation as resolved.
Show resolved Hide resolved
spec:
gvks:
- group: "networking.k8s.io"
version: "v1"
kind: "Ingress"
```

The resources defined in the `gvks` field of a SyncSet will be eventually synced into the data client.

### Replicating Data with Config

`Feature State`: Gatekeeper version v3.6+ (alpha)

> The "Config" resource must be named `config` for it to be reconciled by Gatekeeper. Gatekeeper will ignore the resource if you do not name it `config`.

Kubernetes data can also be replicated into the data client via the Config resource. Resources defined in `syncOnly` will be synced into OPA. Below is an example:

```yaml
apiVersion: config.gatekeeper.sh/v1alpha1
Expand All @@ -36,11 +59,22 @@ You can install this config with the following command:
kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/master/demo/basic/sync.yaml
```

Once data is synced into OPA, rules can access the cached data under the `data.inventory` document.
### Working with SyncSet and Config resources
acpana marked this conversation as resolved.
Show resolved Hide resolved

The `data.inventory` document has the following format:
* Updating a Config's `syncOnly` field or a SyncSet's `gvks` field should dynamically update what objects are synced.
* Multiple `SyncSet`s may be defined and those will be reconciled by the Gatekeeper syncset-controller. Notably, the [set union](https://en.wikipedia.org/wiki/Union_(set_theory)) of all SyncSet resources' `gvks` and the [Config](sync#replicating-data-with-config) resource's `syncOnly` will be synced into the data client.
* A resource will continue to be present in the data client so long as a SyncSet or Config still specifies it under the `gvks` or `syncOnly` field.

### Accessing replicated data

Once data is synced, ConstraintTemplates can access the cached data under the `data.inventory` document.

The `data.inventory` document has the following format:
* For cluster-scoped objects: `data.inventory.cluster[<groupVersion>][<kind>][<name>]`
* Example referencing the Gatekeeper namespace: `data.inventory.cluster["v1"].Namespace["gatekeeper"]`
* For namespace-scoped objects: `data.inventory.namespace[<namespace>][groupVersion][<kind>][<name>]`
* Example referencing the Gatekeeper pod: `data.inventory.namespace["gatekeeper"]["v1"]["Pod"]["gatekeeper-controller-manager-d4c98b788-j7d92"]`

### Auditing From Cache

The [audit](audit.md) feature does not require replication by default. However, when the ``audit-from-cache`` flag is set to true, the audit informer cache will be used as the source-of-truth for audit queries; thus, an object must first be cached before it can be audited for constraint violations. Kubernetes data can be replicated into the audit cache via one of the resources above.
acpana marked this conversation as resolved.
Show resolved Hide resolved