Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization #1475

Conversation

camilamacedo86
Copy link
Contributor

@camilamacedo86 camilamacedo86 commented Nov 17, 2024

This commit removes the use of the kube-rbac-proxy image and replaces it with metrics authentication/authorization provided by controller-runtime. The kube-rbac-proxy image is deprecated and will no longer be maintained, which introduces risks to production environments. For more details, see: kubernetes-sigs/kubebuilder#3907

PR for catalogd: operator-framework/catalogd#460
Motivation: #1509

Tests done locally

To ensure backwards compatibility

Create the clusterrolebinding

$ kubectl create clusterrolebinding operator-controller-metrics-binding \
   --clusterrole=operator-controller-metrics-reader \
   --serviceaccount=olmv1-system:operator-controller-controller-manager

Create the TOKEN

TOKEN=$(kubectl create token operator-controller-controller-manager -n olmv1-system)
echo $TOKEN

Use curl to validate

kubectl label namespace olmv1-system pod-security.kubernetes.io/enforce-
kubectl run curl-metrics --rm -it --restart=Never \
  --image=curlimages/curl:7.87.0 -n olmv1-system -- /bin/sh

Call the metrics:

 curl -v -k -H "Authorization: Bearer $TOKEN" \
> https://operator-controller-controller-manager-metrics-service.olmv1-system.sv
c.cluster.local:8443/metrics
*   Trying 10.96.16.165:8443...
* Connected to operator-controller-controller-manager-metrics-service.olmv1-system.svc.cluster.local (10.96.16.165) port 8443 (#0)
* ALPN: offers h2
* ALPN: offers http/1.1
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS handshake, Client hello (1):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Server hello (2):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Certificate (11):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, CERT verify (15):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Finished (20):
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN: server accepted h2
* Server certificate:
*  subject: [NONE]
*  start date: Dec 12 20:40:15 2024 GMT
*  expire date: Mar 12 20:40:15 2025 GMT
*  issuer: CN=olmv1-ca
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multiplexing
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* h2h3 [:method: GET]
* h2h3 [:path: /metrics]
* h2h3 [:scheme: https]
* h2h3 [:authority: operator-controller-controller-manager-metrics-service.olmv1-system.svc.cluster.local:8443]
* h2h3 [user-agent: curl/7.87.0-DEV]
* h2h3 [accept: */*]
* h2h3 [authorization: Bearer TOKEN
* Using Stream ID: 1 (easy handle 0xffff82fa3aa0)
> GET /metrics HTTP/2
> Host: operator-controller-controller-manager-metrics-service.olmv1-system.svc.cluster.local:8443
> user-agent: curl/7.87.0-DEV
> accept: */*
> authorization: Bearer TOKEN 
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
< HTTP/2 200 
< content-type: text/plain; version=0.0.4; charset=utf-8; escaping=values
< date: Thu, 12 Dec 2024 20:43:10 GMT
< 
# HELP certwatcher_read_certificate_errors_total Total number of certificate read errors
# TYPE certwatcher_read_certificate_errors_total counter
certwatcher_read_certificate_errors_total 0
# HELP certwatcher_read_certificate_total Total number of certificate reads
# TYPE certwatcher_read_certificate_total counter
certwatcher_read_certificate_total 1
# HELP controller_runtime_active_workers Number of currently used workers per controller
# TYPE controller_runtime_active_workers gauge
....
workqueue_work_duration_seconds_count{controller="clusterextension",name="clusterextension"} 0
* Connection #0 to host operator-controller-controller-manager-metrics-service.olmv1-system.svc.cluster.local left intact

Now, to validate the call with the certificates

Create the POD with the secret

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: curl-metrics
  namespace: olmv1-system
spec:
  serviceAccountName: operator-controller-controller-manager
  containers:
  - name: curl
    image: curlimages/curl:7.87.0
    command:
    - sh
    - -c
    - sleep 3600
    volumeMounts:
    - mountPath: /tmp/cert
      name: olm-cert
      readOnly: true
  volumes:
  - name: olm-cert
    secret:
      secretName: olmv1-cert
  restartPolicy: Never
EOF
$ kubectl exec -it curl-metrics -n olmv1-system -- sh

$ curl -v --cacert /tmp/cert/ca.crt --cert /tmp/cert/tls.crt --key /tmp/cert/t
ls.key \
> -H "Authorization: Bearer $TOKEN" \
> https://operator-controller-controller-manager-metrics-service.olmv1-system.sv
c.cluster.local:8443/metrics
*   Trying 10.96.16.165:8443...
* Connected to operator-controller-controller-manager-metrics-service.olmv1-system.svc.cluster.local (10.96.16.165) port 8443 (#0)
* ALPN: offers h2
* ALPN: offers http/1.1
*  CAfile: /tmp/cert/ca.crt
*  CApath: none
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS handshake, Client hello (1):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Server hello (2):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Certificate (11):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, CERT verify (15):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Finished (20):
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN: server accepted h2
* Server certificate:
*  subject: [NONE]
*  start date: Dec 12 20:40:15 2024 GMT
*  expire date: Mar 12 20:40:15 2025 GMT
*  subjectAltName: host "operator-controller-controller-manager-metrics-service.olmv1-system.svc.cluster.local" matched cert's "operator-controller-controller-manager-metrics-service.olmv1-system.svc.cluster.local"
*  issuer: CN=olmv1-ca
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* h2h3 [:method: GET]
* h2h3 [:path: /metrics]
* h2h3 [:scheme: https]
* h2h3 [:authority: operator-controller-controller-manager-metrics-service.olmv1-system.svc.cluster.local:8443]
* h2h3 [user-agent: curl/7.87.0-DEV]
* h2h3 [accept: */*]
* h2h3 [authorization: Bearer TOKEN
* Using Stream ID: 1 (easy handle 0xffffb0cf9aa0)
> GET /metrics HTTP/2
> Host: operator-controller-controller-manager-metrics-service.olmv1-system.svc.cluster.local:8443
> user-agent: curl/7.87.0-DEV
> accept: */*
> authorization: Bearer TOKEN
> 
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
< HTTP/2 200 
< content-type: text/plain; version=0.0.4; charset=utf-8; escaping=values
< date: Thu, 12 Dec 2024 20:46:29 GMT
< 

@camilamacedo86 camilamacedo86 requested a review from a team as a code owner November 17, 2024 10:19
Copy link

netlify bot commented Nov 17, 2024

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit fccc728
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/6764843156b32600086cd7c6
😎 Deploy Preview https://deploy-preview-1475--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@camilamacedo86 camilamacedo86 force-pushed the replace-rbac-protect-metrics branch 3 times, most recently from 4349f0e to 48dc64a Compare November 17, 2024 10:22
Copy link

codecov bot commented Nov 17, 2024

Codecov Report

Attention: Patch coverage is 51.66667% with 29 lines in your changes missing coverage. Please review.

Project coverage is 74.13%. Comparing base (b67bd38) to head (fccc728).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
cmd/manager/main.go 51.66% 24 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1475      +/-   ##
==========================================
- Coverage   74.65%   74.13%   -0.52%     
==========================================
  Files          42       42              
  Lines        3271     3329      +58     
==========================================
+ Hits         2442     2468      +26     
- Misses        653      679      +26     
- Partials      176      182       +6     
Flag Coverage Δ
e2e 52.11% <51.66%> (-0.04%) ⬇️
unit 56.86% <0.00%> (-1.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

cmd/manager/main.go Outdated Show resolved Hide resolved
@camilamacedo86 camilamacedo86 force-pushed the replace-rbac-protect-metrics branch from 48dc64a to 3487ab3 Compare November 18, 2024 12:02
cmd/manager/main.go Outdated Show resolved Hide resolved
@joelanford
Copy link
Member

/hold

This doesn't seem like a change that should be made after we've released 1.0.0-rc1. Let's wait until after we cut 1.0.0.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 18, 2024
@camilamacedo86 camilamacedo86 force-pushed the replace-rbac-protect-metrics branch from 3487ab3 to a7ace73 Compare November 18, 2024 13:35
@camilamacedo86

This comment was marked as resolved.

@joelanford
Copy link
Member

Let's prioritize a design for this. I do not want to add any risk to our 1.0.0 in the final week after we've already released an RC.

@camilamacedo86
Copy link
Contributor Author

/hold

@camilamacedo86 camilamacedo86 changed the title ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization (WIP) - POC - RFC ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization Nov 19, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 19, 2024
@camilamacedo86 camilamacedo86 force-pushed the replace-rbac-protect-metrics branch from 938fb19 to 7e22db8 Compare November 19, 2024 19:06
@camilamacedo86 camilamacedo86 changed the title (WIP) - POC - RFC ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization (WIP) - ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization Nov 19, 2024
@camilamacedo86 camilamacedo86 force-pushed the replace-rbac-protect-metrics branch 4 times, most recently from aaf8ce9 to e3b9df9 Compare November 19, 2024 19:38
@camilamacedo86 camilamacedo86 force-pushed the replace-rbac-protect-metrics branch 5 times, most recently from a5d0cbc to 1488a45 Compare December 19, 2024 17:57
@camilamacedo86 camilamacedo86 force-pushed the replace-rbac-protect-metrics branch from 1488a45 to 423bf5d Compare December 19, 2024 18:02
Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 19, 2024
@LalatenduMohanty
Copy link
Member

/hold for @trgeiger to take a look as well

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 19, 2024
cmd/manager/main.go Outdated Show resolved Hide resolved
cmd/manager/main.go Outdated Show resolved Hide resolved
…ontroller-runtime feature

Utilise Controller-Runtime's WithAuthenticationAndAuthorization feature to protect the metrics endpoint. This approach provides access control, similar to the functionality of kube-rbac-proxy. kube-rbac-proxy image from gcr.io/kubebuilder/kube-rbac-proxy is deprecated and should no longer be used

More info: kubernetes-sigs/kubebuilder#3907
@camilamacedo86 camilamacedo86 force-pushed the replace-rbac-protect-metrics branch from 423bf5d to fccc728 Compare December 19, 2024 20:38
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 19, 2024
Copy link
Contributor

@trgeiger trgeiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@camilamacedo86
Copy link
Contributor Author

/hold false

@LalatenduMohanty
Copy link
Member

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 19, 2024
Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@camilamacedo86 camilamacedo86 added this pull request to the merge queue Dec 19, 2024
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 19, 2024
Merged via the queue into operator-framework:main with commit 10f0f77 Dec 19, 2024
17 of 19 checks passed
@camilamacedo86 camilamacedo86 deleted the replace-rbac-protect-metrics branch December 19, 2024 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants