Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to delete config resource due to built-in Gatekeeper policy validator #3058

Closed
skaven81 opened this issue Oct 11, 2023 · 4 comments · Fixed by #3089
Closed

Unable to delete config resource due to built-in Gatekeeper policy validator #3058

skaven81 opened this issue Oct 11, 2023 · 4 comments · Fixed by #3089
Assignees
Labels
bug Something isn't working

Comments

@skaven81
Copy link

skaven81 commented Oct 11, 2023

What steps did you take and what happened:
I have the Gatekeeper webhook installed with validation of deletions enabled. This means that validateConfigResource() gets called from the Gatekeeper validating webhook, when the Gatekeeper Config resource is deleted. And this validation fails, resulting in:

    message: 'Failed to delete all resource types, 1 remaining: admission webhook
      "validation.gatekeeper.sh" denied the request: config resource must have name
      ''config'''

This only seems to happen when the config resource is deleted by the Kubernetes namespace controller, as part of its cascading deletions upon deleting a namespace.

The API call that the namespace controller makes looks like this in the API logs:

I1011 20:15:29.273928       1 httplog.go:132] "HTTP" verb="DELETE" URI="/apis/config.gatekeeper.sh/v1alpha1/namespaces/gkcfg-dev/configs" latency="12.097685ms" userAgent="kube-controller-manager/v1.26.8 (linux/amd64) kubernetes/395f0a2/system:serviceaccount:kube-system:namespace-controller" audit-ID="b107da92-f1a7-4def-ae5a-fe98d818125b" srcIP="127.0.0.1:52480" apf_pl="workload-high" apf_fs="kube-system-service-accounts" apf_iseats=1 apf_fseats=0 apf_additionalLatency="0s" apf_execution_time="11.957524ms" resp=422

Observe that the response was a 422, and that the deletion is not for the config resource specifically, but rather the group of resources, /apis/config.gatekeeper.sh/v1alpha1/namespaces/<ns>/configs as opposed to the more typical direct resource deletion of /apis/config.gatekeeper.sh/v1alpha1/namespaces/<ns>/configs/config which seems to work fine.

What I believe is happening is that validateConfigResource() doesn't grok how to deal with this method of deletion. The kube-controller-manager is effectively saying "delete all Configs" without specifying a name, and so validateConfigResource() returns a violation because the name is blank, not config.

What did you expect to happen:

Deleting a namespace containing a Gatekeeper OPA Config resource should not result in the kube-controller-manager getting blocked trying to delete the config resource because of a Gatekeeper validation error.

Environment:

  • Gatekeeper version: 3.10.0
  • Kubernetes version: (use kubectl version): Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.8", GitCommit:"395f0a2fdc940aeb9ab88849e8fa4321decbf6e1", GitTreeState:"clean", BuildDate:"2023-08-24T00:43:07Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}
@skaven81 skaven81 added the bug Something isn't working label Oct 11, 2023
@acpana
Copy link
Contributor

acpana commented Oct 18, 2023

thanks for filing this @skaven81 .

IIUC, one should be able to replicate this locally with the kubectl delete --raw "/apis/config.gatekeeper.sh/v1alpha1/namespaces/<ns>/configs" but I am not able to reproduce this in my test kind cluster.

Would you mind sharing your config resource yaml for which this scenario errs out? And maybe the gatekeeper installation too ?

@skaven81
Copy link
Author

The specific use case was that I was developing a controller using the KOPF framework that would dynamically build a Gatekeeper Config resource in response to live changes in the cluster.

While testing my code, I had created a dev namespace (other than gatekeeper-system) where I deployed my controller. The controller then dutifully created a config resource in the same namespace. When I was satisfied that everything was working, I tore down my dev environment by deleting the namespace. I observed that the namespace would not go away, and that it was hung up on the kubernetes finalizer, because the config resource couldn't be deleted.

I was able to kubectl -n <dev-ns> delete config config and then the namespace went away. Directly deleting the config resource seems to work fine. It's only the indirect deletion caused by namespace deletion, that seems to trip up Gatekeeper.

@acpana
Copy link
Contributor

acpana commented Oct 18, 2023

thanks for the additional info!

I had created a dev namespace ... created a config resource in the same namespace

there it is, which is why i had asked for the config resource definition. Using a custom namespace, other than gatekeeper-system for the config resources, allows me to repro the issue:

$ kubectl delete --raw "/apis/config.gatekeeper.sh/v1alpha1/namespaces/gatekeeper-system-dev/configs"
Error from server: admission webhook "validation.gatekeeper.sh" denied the request: config resource must have name 'config'

I'll look into why this happens in this case and submit a patch

@acpana acpana self-assigned this Oct 18, 2023
@skaven81
Copy link
Author

Using a custom namespace, other than gatekeeper-system for the config resources

I suspect that's because the gatekeeper-system namespace is exempted from all policy evaluations by default. If you disabled that behavior by removing the --exempt-namespace=gatekeeper-system argument from the webhook deployment, and/or removed the admission.gatekeeper.sh/ignore: "true" label from the gatekeeper-system namespace, you'd likely be able to reproduce it in that namespace too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants