Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add CustomResourceMonitor CRD (WIP) #2049

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
211 changes: 211 additions & 0 deletions docs/design/custom-resource-metrics-crd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
# Kube-State-Metrics - CustomResourceMonitor CRD Proposal


---

Authors: Catherine Fang (CatherineF-dev@), Christian Schlotter (chrischdi@)

Date: 26. Jun 2023

Target release: v

---

## Table of Contents
- [Glossary](#glossary)
- [Problem Statement](#problem-statement)
- [Goal](#goal)
- [Status Quo](#status-quo)
- [Proposal](#proposal)
- [New flags](#new-flags)
- [CustomResourceMonitor Definition](#customresourcemonitor-definition)
- [Watch and Reconcile on CustomResourceMonitor CRs](#watch-and-reconcile-on-customresourcemonitor-crs)
- [CUJ](#cuj)



## Glossary

- kube-state-metrics: “Simple service that listens to the Kubernetes API server
and generates metrics about the state of the objects”
- Custom Resource: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources
- [CustomResourceState](https://github.com/kubernetes/kube-state-metrics/blob/main/docs/customresourcestate-metrics.md) monitoring feature: existing feature which collects custom resource metrics


## Problem Statement

1. Using CustomResourceState monitoring feature is not user-friendly. Current ways on configuring CustomResourceState monitoring are:
* `--custom-resource-state-config "inline yaml (see example)"` or
* `--custom-resource-state-config-file /path/to/config.yaml`. Either mounted or configmap.

2. Current CustomResourceState monitoring feature doesn't support multiple configuration files.

For example, for a company with 10 teams, each team wants to collect Custom Resource metrics for their owned Custom Resources.


## Goal

A better UX to collect custom resource metrics

## Proposal

Add a custom resource definition (CustomResourceMonitor) which contains customresourcestate.Metrics.

kube-state-metrics watched on CustomResourceMonitor CRs and concatenate these CRs into one config `customresourcestate.Metrics` which has the same content using `--custom-resource-state-config`.

### New flags
Apart from existing two flags (`--custom-resource-state-config ` and `--custom-resource-state-config-file`), these three flags will be added:
* `--custom_resource_monitor`: whether watch CustomResourceMonitor CRs or not.
* `--custom_resource_monitor_labels`: only watch CustomResourceMonitor with [labelSelectors](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#list-and-watch-filtering). For example, `environment=production,tier=frontend` means selecting CustomResourceMonitor CRs which have these two labels. It's used to avoid double custom metrics collection when multiple kube-state-metrics are installed.
* `--custom_resource_monitor_namespaces`: only watch CustomResourceMonitor under namespaces=x.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider a denylist for namespaces as well?
In case someone has a "platform KSM" (only selected namespaces) and a "product KSM" (everything else)?


If `--custom_resources_monitor_enabled` is set, `--custom-resource-state-config` and `--custom-resource-state-config-file` will be ignored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to make it mutually exclusive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simplify UX.

The migration from --custom-resource-state-config/--custom-resource-state-config-file to --custom_resources_monitor is simple.


### CustomResourceMonitor Definition
CatherineF-dev marked this conversation as resolved.
Show resolved Hide resolved

* GroupName: kubestatemetrics.io
* Alternative kubestatemetrics.k8s.io: 1. *.k8s.io needs approval 2. ksm isn't inside k/k repo (need to double confirm)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if this really needs approval and by whom?

autoscaler also uses *.k8s.io: https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/examples/hamster.yaml#L8

I would prefer using ksm.k8s.io or kubestatemetris.k8s.io

* Alternative ksm.io: has been used by a company
* Version: v1alpha1
* Kind: CustomResourceMonitor

```go
package v1alpha1

import (
"k8s.io/kube-state-metrics/v2/pkg/customresourcestate"
)

type CustomResourceMonitor struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
customresourcestate.Metrics `json:",inline"`
}
```

```yaml
# Example CR
apiVersion: kubestatemetrics.io/v1alpha1
kind: CustomResourceMonitor
metadata:
name: test-cr2
namespace: kube-system
generation: 1
spec:
resources:
- groupVersionKind:
group: myteam.io
kind: "Foo"
version: "v1"
metrics:
- name: "uptime"
help: "Foo uptime"
each:
type: Gauge
gauge:
path: [status, uptime]
```

### Watch and Reconcile on CustomResourceMonitor CRs

Kube-state-metrics listens on add, update and delete events of CustomResourceMonitor via Kubernetes
client-go reflectors. On these events kube-state-metrics lists all CustomResourceMonitor CRs and concatenate CRs into one config `customresourcestate.Metrics` which has the almost same content with `--custom-resource-state-config` config.

This generated custom resource config updates CustomResourceStore by adding monitored custom resource stores and deleting unmonitored custom resource stores.


```yaml
# example cr
apiVersion: kubestatemetrics.io/v1alpha1
kind: CustomResourceMonitor
metadata:
name: nodepool
spec:
resources:
- groupVersionKind:
group: addons.k8s.io
kind: "FakedNodePool"
version: "v1alpha1"
metrics:
- name: "nodepool_generation"
help: "Nodepool generation"
each:
type: Gauge
gauge:
path: [metadata, generation]
```


```
+---------------+ +---------------------+ +-----------------------+
| CRM_informer | | nodepool_reflector | | custom_resource_store |
+---------------+ +---------------------+ +-----------------------+
---------------------\ | | |
| add/update/delete |-| | |
| CustomResource | | | |
| Monitor CR | | | |
| (monitor-nodepool) | | | |
|--------------------| | | |
| | |
| ListAndAddCustomResourceMonitors() |
|-------------------------------------------------->|
| | |
| DeleteOldCustomResourceMonitors() |
|-------------------------------------------------->|
| | |
| | Update(nodepool) |
| |----------------------------->|
| | | ----------\
| | |-| Build() |
| | | |---------|
| | | ----------------------------\
| | |-| generateMetrics(nodepool) |
| | | |---------------------------|
| | |
```

<details>
<summary>Code to reproduce diagram</summary>

Build via [text-diagram](http://weidagang.github.io/text-diagram/)

```
object CRM_informer nodepool_reflector custom_resource_store

note left of CRM_informer: add/update/delete \n CustomResource \n Monitor CR \n(monitor-nodepool)
CRM_informer -> custom_resource_store: ListAndAddCustomResourceMonitors()
CRM_informer -> custom_resource_store: DeleteOldCustomResourceMonitors()


nodepool_reflector -> custom_resource_store: Update(nodepool)

note right of custom_resource_store: Build()

note right of custom_resource_store: generateMetrics(nodepool)
```


</details>

Custom Resource store always [adds](https://github.com/kubernetes/kube-state-metrics/blob/main/internal/store/builder.go#L186) new custom resource metrics. Deletion of custom resource metrics needs to be implemented.

### Alternatives
- Generate metrics configuration based on field annotations: https://github.com/kubernetes/kube-state-metrics/issues/1899
- Limitation: need to have source code permission

## Migrate from CustomResourceState
```
+ apiVersion: kubestatemetrics.io/v1alpha1
- kind: CustomResourceStateMetrics
+ kind: CustomResourceMonitor
+ metadata:
+ name: crm_nodepool
+ labels:
+ monitoring.backend.io: true
spec: # copy content from --custom-resource-state-config-file
```

## Critical User Journey (CUJ)
* cloud-provider: watch CustomResourceMonitor CRs with label `monitoring.(gke|aks|eks).io=true` under system namespaces
* application platform: watch CustomResourceMonitor CRs with label `monitoring.frontend.io=true` under non-system namespaces
* monitoring platform team: watch CustomResourceMonitor CRs with label `monitoring.platform.io=true` under non-system namespaces
20 changes: 20 additions & 0 deletions examples/cr.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# TODO: use another CR
apiVersion: customresource.ksm.io/v1alpha1
kind: CustomResourceMonitor
metadata:
name: test-cr
namespace: kube-system
generation: 2
spec:
resources:
- groupVersionKind:
group: addons.gke.io
kind: "Stackdriver"
version: "v1alpha1"
metrics:
- name: "uptime"
help: "Foo uptime"
each:
type: Gauge
gauge:
path: [metadata, generation]
19 changes: 19 additions & 0 deletions examples/cr2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: customresource.ksm.io/v1alpha1
kind: CustomResourceMonitor
metadata:
name: test-cr2
namespace: kube-system
generation: 1
spec:
resources:
- groupVersionKind:
group: customresource.ksm.io
kind: "CustomResourceMonitor"
version: "v1alpha1"
metrics:
- name: "uptime2"
help: "Bar uptime"
each:
type: Gauge
gauge:
path: [metadata, generation]
41 changes: 34 additions & 7 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -191,11 +191,16 @@ golang.org/x/time v0.5.0 h1:o7cqy6amK/52YcAKIPlM3a+Fpj35zvRj2TP+e1xFSfk=
golang.org/x/time v0.5.0/go.mod h1:3BpzKBy/shNhVucY/MWOyx10tF3SFh9QdLuxbVysPQM=
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
golang.org/x/tools v0.0.0-20200619180055-7c47624df98f/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE=
golang.org/x/tools v0.0.0-20210106214847-113979e3529a/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA=
golang.org/x/tools v0.18.0 h1:k8NLag8AGHnn+PHbl7g43CtqZAwG60vZkLqgyZgIHgQ=
golang.org/x/tools v0.18.0/go.mod h1:GL7B4CwcLLeo59yx/9UWWuNOW1n3VZ4f5axWfML7Lcg=
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/tools v0.0.0-20191125144606-a911d9008d1f/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
golang.org/x/tools v0.0.0-20191130070609-6e064ea0cf2d/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
golang.org/x/tools v0.0.0-20191216173652-a0e659d51361/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28=
golang.org/x/tools v0.0.0-20191227053925-7b8e75db28f4/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28=
golang.org/x/tools v0.0.0-20200117161641-43d50277825c/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28=
golang.org/x/tools v0.0.0-20200122220014-bf1340f18c4a/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28=
golang.org/x/tools v0.0.0-20200130002326-2f3ba24bd6e7/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28=
golang.org/x/tools v0.0.0-20200204074204-1cc6d1ef6c74/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28=
golang.org/x/tools v0.0.0-20200207183749-b753a1ba74fa/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28=
golang.org/x/tools v0.0.0-20200212150539-ea181f53ac56/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28=
golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
Expand Down Expand Up @@ -236,5 +241,27 @@ sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd h1:EDPBXCAspyGV4jQlpZSudPeMm
sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd/go.mod h1:B8JuhiUyNFVKdsE8h686QcCxMaH6HrOAZj4vswFpcB0=
sigs.k8s.io/structured-merge-diff/v4 v4.4.1 h1:150L+0vs/8DA78h1u02ooW1/fFq/Lwr+sGiqlzvrtq4=
sigs.k8s.io/structured-merge-diff/v4 v4.4.1/go.mod h1:N8hJocpFajUSSeSJ9bOZ77VzejKZaXsTtZo4/u7Io08=
sigs.k8s.io/yaml v1.3.0 h1:a2VclLzOGrwOHDiV8EfBGhvjHvP46CtW5j6POvhYGGo=
sigs.k8s.io/yaml v1.3.0/go.mod h1:GeOyir5tyXNByN85N/dRIT9es5UQNerPYEKK56eTBm8=
honnef.co/go/tools v0.0.0-20190102054323-c2f93a96b099/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
honnef.co/go/tools v0.0.0-20190106161140-3f1c8253044a/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
honnef.co/go/tools v0.0.0-20190418001031-e561f6794a2a/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
honnef.co/go/tools v0.0.1-2020.1.3/go.mod h1:X/FiERA/W4tHapMX5mGpAtMSVEeEUOyHaw9vFzvIQ3k=
honnef.co/go/tools v0.0.1-2020.1.4/go.mod h1:X/FiERA/W4tHapMX5mGpAtMSVEeEUOyHaw9vFzvIQ3k=
k8s.io/api v0.27.2 h1:+H17AJpUMvl+clT+BPnKf0E3ksMAzoBBg7CntpSuADo=
k8s.io/api v0.27.2/go.mod h1:ENmbocXfBT2ADujUXcBhHV55RIT31IIEvkntP6vZKS4=
k8s.io/apimachinery v0.27.2 h1:vBjGaKKieaIreI+oQwELalVG4d8f3YAMNpWLzDXkxeg=
k8s.io/apimachinery v0.27.2/go.mod h1:XNfZ6xklnMCOGGFNqXG7bUrQCoR04dh/E7FprV6pb+E=
k8s.io/client-go v0.27.2 h1:vDLSeuYvCHKeoQRhCXjxXO45nHVv2Ip4Fe0MfioMrhE=
k8s.io/client-go v0.27.2/go.mod h1:tY0gVmUsHrAmjzHX9zs7eCjxcBsf8IiNe7KQ52biTcQ=
k8s.io/component-base v0.27.2 h1:neju+7s/r5O4x4/txeUONNTS9r1HsPbyoPBAtHsDCpo=
k8s.io/component-base v0.27.2/go.mod h1:5UPk7EjfgrfgRIuDBFtsEFAe4DAvP3U+M8RTzoSJkpo=
k8s.io/klog/v2 v2.100.1 h1:7WCHKK6K8fNhTqfBhISHQ97KrnJNFZMcQvKp7gP/tmg=
k8s.io/klog/v2 v2.100.1/go.mod h1:y1WjHnz7Dj687irZUWR/WLkLc5N1YHtjLdmgWjndZn0=
k8s.io/kube-openapi v0.0.0-20230501164219-8b0f38b5fd1f h1:2kWPakN3i/k81b0gvD5C5FJ2kxm1WrQFanWchyKuqGg=
k8s.io/kube-openapi v0.0.0-20230501164219-8b0f38b5fd1f/go.mod h1:byini6yhqGC14c3ebc/QwanvYwhuMWF6yz2F8uwW8eg=
k8s.io/sample-controller v0.27.2 h1:KTdiLknxjf0CB4LTTJTGfzJjnqR5QA/pgUQvXJqyw/I=
k8s.io/sample-controller v0.27.2/go.mod h1:WfiHY1M7OODPfq9OX+6Vc3Df+R5A4yWwctYC2og0hPo=
k8s.io/utils v0.0.0-20230505201702-9f6742963106 h1:EObNQ3TW2D+WptiYXlApGNLVy0zm/JIBVY9i+M4wpAU=
k8s.io/utils v0.0.0-20230505201702-9f6742963106/go.mod h1:OLgZIPagt7ERELqWJFomSt595RzquPNLL48iOWgYOg0=
rsc.io/binaryregexp v0.2.0/go.mod h1:qTv7/COck+e2FymRvadv62gMdZztPaShugOCi3I+8D8=
rsc.io/quote/v3 v3.1.0/go.mod h1:yEA65RcK8LyAZtP9Kv3t0HmxON59tX3rD+tICJqUlj0=
Loading
Loading