Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization #460

Merged

Conversation

camilamacedo86
Copy link
Contributor

@camilamacedo86 camilamacedo86 commented Nov 18, 2024

This commit removes the use of the kube-rbac-proxy image and replaces it with metrics authentication/authorization provided by controller-runtime. The kube-rbac-proxy image is deprecated and will no longer be maintained, which introduces risks to production environments. For more details, see: kubernetes-sigs/kubebuilder#3907

Motivation: operator-framework/operator-controller#1509

Local Tests

To check the metrics endpoint

To grant the required permissions for metrics access, run:

kubectl create clusterrolebinding catalogd-metrics-binding \
   --clusterrole=catalogd-metrics-reader \
   --serviceaccount=olmv1-system:catalogd-controller-manager

Generate the token for the catalogd-controller-manager service account:

TOKEN=$(kubectl create token catalogd-controller-manager -n olmv1-system)
echo $TOKEN

Run a pod with a debug container to test the metrics endpoint:

kubectl run curl-metrics --rm -it --restart=Never \
  --image=curlimages/curl:7.87.0 -n olmv1-system -- /bin/sh

Checking the metrics

curl -v -k -H "Authorization: Bearer $TOKEN" \
https://catalogd-service.olmv1-system.svc.cluster.local:7443/metrics

Result

/ $ curl -v -k -H "Authorization: Bearer $TOKEN" \
> https://catalogd-service.olmv1-system.svc.cluster.local:7443/metrics
*   Trying 10.96.52.46:7443...
* Connected to catalogd-service.olmv1-system.svc.cluster.local (10.96.52.46) port 7443 (#0)
* ALPN: offers h2
* ALPN: offers http/1.1
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS handshake, Client hello (1):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Server hello (2):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Certificate (11):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, CERT verify (15):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Finished (20):
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN: server accepted h2
* Server certificate:
*  subject: [NONE]
*  start date: Dec 12 22:03:23 2024 GMT
*  expire date: Mar 12 22:03:23 2025 GMT
*  issuer: CN=olmv1-ca
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multiplexing
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* h2h3 [:method: GET]
* h2h3 [:path: /metrics]
* h2h3 [:scheme: https]
* h2h3 [:authority: catalogd-service.olmv1-system.svc.cluster.local:7443]
* h2h3 [user-agent: curl/7.87.0-DEV]
* h2h3 [accept: */*]
* h2h3 [authorization: Bearer TOKEN
> authorization: Bearer TOKEN
> 
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
< HTTP/2 200 
< content-type: text/plain; version=0.0.4; charset=utf-8; escaping=values
< date: Thu, 12 Dec 2024 22:11:24 GMT
< 
# HELP certwatcher_read_certificate_errors_total Total number of certificate read errors
# TYPE certwatcher_read_certificate_errors_total counter
certwatcher_read_certificate_errors_total 0
# HELP certwatcher_read_certificate_total Total number of certificate reads
...
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="9.999999999999999e-05"} 0
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="0.001"} 1
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="0.01"} 2
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="0.1"} 2
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1"} 2
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="10"} 3
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="100"} 3
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="1000"} 3
workqueue_work_duration_seconds_bucket{controller="clustercatalog",name="clustercatalog",le="+Inf"} 3
workqueue_work_duration_seconds_sum{controller="clustercatalog",name="clustercatalog"} 9.588737754
workqueue_work_duration_seconds_count{controller="clustercatalog",name="clustercatalog"} 3
* Connection #0 to host catalogd-service.olmv1-system.svc.cluster.local left intact

To validate the usage of certs within

Create the Pod with the secret

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: curl-metrics
  namespace: olmv1-system
spec:
  serviceAccountName: catalogd-controller-manager
  containers:
  - name: curl
    image: curlimages/curl:7.87.0
    command:
    - sh
    - -c
    - sleep 3600
    volumeMounts:
    - mountPath: /tmp/cert
      name: catalogd-cert
      readOnly: true
  volumes:
  - name: catalogd-cert
    secret:
      secretName: catalogd-service-cert-v1.0.0-rc1-22-gedbc799-dirty
  restartPolicy: Never
EOF

Jump in the curl

kubectl exec -it curl-metrics -n olmv1-system -- sh

Run the curl calling the metrics

curl -v --cacert /tmp/cert/ca.crt \
     --cert /tmp/cert/tls.crt --key /tmp/cert/tls.key \
     -H "Authorization: Bearer $TOKEN" \
     https://catalogd-service.olmv1-system.svc.cluster.local:7443/metrics

Result

$ kubectl exec -it curl-metrics -n olmv1-system -- sh
/ $ export TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
/ $ curl -v --cacert /tmp/cert/ca.crt \
>      --cert /tmp/cert/tls.crt --key /tmp/cert/tls.key \
>      -H "Authorization: Bearer $TOKEN" \
>      https://catalogd-service.olmv1-system.svc.cluster.local:7443/metrics
*   Trying 10.96.52.46:7443...
* Connected to catalogd-service.olmv1-system.svc.cluster.local (10.96.52.46) port 7443 (#0)
* ALPN: offers h2
* ALPN: offers http/1.1
*  CAfile: /tmp/cert/ca.crt
*  CApath: none
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS handshake, Client hello (1):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Server hello (2):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Certificate (11):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, CERT verify (15):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Finished (20):
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN: server accepted h2
* Server certificate:
*  subject: [NONE]
*  start date: Dec 12 22:03:23 2024 GMT
*  expire date: Mar 12 22:03:23 2025 GMT
*  subjectAltName: host "catalogd-service.olmv1-system.svc.cluster.local" matched cert's "catalogd-service.olmv1-system.svc.cluster.local"
*  issuer: CN=olmv1-ca
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* h2h3 [:method: GET]
* h2h3 [:path: /metrics]
* h2h3 [:scheme: https]
* h2h3 [:authority: catalogd-service.olmv1-system.svc.cluster.local:7443]
* h2h3 [user-agent: curl/7.87.0-DEV]
* h2h3 [accept: */*]

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 18, 2024
Copy link

codecov bot commented Nov 18, 2024

Codecov Report

Attention: Patch coverage is 0% with 32 lines in your changes missing coverage. Please review.

Project coverage is 37.20%. Comparing base (cab110e) to head (0424fe4).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
cmd/manager/main.go 0.00% 32 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #460      +/-   ##
==========================================
- Coverage   37.89%   37.20%   -0.70%     
==========================================
  Files          15       15              
  Lines        1235     1258      +23     
==========================================
  Hits          468      468              
- Misses        717      740      +23     
  Partials       50       50              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch 3 times, most recently from befdef7 to 8ca7c1d Compare November 18, 2024 12:38
@camilamacedo86 camilamacedo86 changed the title WIP replace kube-rbac-proxy ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization Nov 18, 2024
@camilamacedo86 camilamacedo86 marked this pull request as ready for review November 18, 2024 12:40
@camilamacedo86 camilamacedo86 requested a review from a team as a code owner November 18, 2024 12:40
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 18, 2024
@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch from 0ee7199 to 3183153 Compare November 18, 2024 13:43
@camilamacedo86 camilamacedo86 changed the title ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization WIP: ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization Nov 18, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 18, 2024
@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch from 3183153 to 4ad7f35 Compare November 18, 2024 13:58
@camilamacedo86 camilamacedo86 changed the title WIP: ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization Nov 18, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 18, 2024
@camilamacedo86 camilamacedo86 changed the title ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization (HOLD - WIP) ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization Nov 18, 2024
@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch from 4ad7f35 to 42147b6 Compare November 26, 2024 00:11
@camilamacedo86
Copy link
Contributor Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 26, 2024
@camilamacedo86 camilamacedo86 changed the title (HOLD - WIP) ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization Nov 26, 2024
@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch from 42147b6 to 0d9cd88 Compare November 26, 2024 00:15
@camilamacedo86 camilamacedo86 changed the title ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization WIP - ⚠️ Replace kube-rbac-proxy with controller-runtime metrics authentication/authorization Nov 26, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 26, 2024
@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch 2 times, most recently from 365aa93 to 5c04183 Compare December 2, 2024 12:15
@tmshort

This comment was marked as resolved.

@camilamacedo86
Copy link
Contributor Author

Hi @joelanford

Can we uphold and move forward with this one?

@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch 4 times, most recently from 5f6b423 to 4f171d3 Compare December 13, 2024 22:22
@camilamacedo86 camilamacedo86 force-pushed the replace-kube-rbac-proxy branch 2 times, most recently from c186e9e to 36c6797 Compare December 18, 2024 09:19
@camilamacedo86
Copy link
Contributor Author

/hold cancel

@@ -98,7 +99,7 @@ func main() {
caCertDir string
globalPullSecret string
)
flag.StringVar(&metricsAddr, "metrics-bind-address", ":8080", "The address the metric endpoint binds to.")
flag.StringVar(&metricsAddr, "metrics-bind-address", "", "The address for the metrics endpoint. Requires tls-cert and tls-key. (Default: ':7443')")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to edit the descriptions of tls-key and tls-cert to note that they are also for the metrics server. Something like:
""The certificate file used by the catalog and metrics servers. Required to enable the metrics server. Requires tls-key."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done see wdyt .. see if that what you were looking for

…n/authorization

This commit removes the use of the kube-rbac-proxy image and replaces it with metrics authentication/authorization provided by controller-runtime. The kube-rbac-proxy image is deprecated and will no longer be maintained, which introduces risks to production environments. For more details, see: kubernetes-sigs/kubebuilder#3907
@camilamacedo86 camilamacedo86 added this pull request to the merge queue Dec 20, 2024
Merged via the queue into operator-framework:main with commit aa178c3 Dec 20, 2024
14 of 16 checks passed
@camilamacedo86 camilamacedo86 deleted the replace-kube-rbac-proxy branch December 20, 2024 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants