-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metricbeat][Kubernetes] Share watchers between metricsets #37332
Conversation
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
- remove nolint:all
// NewResourceMetadataEnricher returns an Enricher configured for kubernetes resource events | ||
// getExtraWatchers returns a list of the extra resources to watch based on some resource. | ||
// The full list can be seen in https://github.com/elastic/beats/issues/37243, at Expected Watchers section. | ||
func getExtraWatchers(resourceName string, config *kubernetesConfig) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also as a general comment all this functions I think can move to util folder and keep same structure (as we have util.NewResourceMetadataEnricher), because now we go back and forth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean in a different file inside util
? They are in the util
folder right now.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without having checked the code line by line, I believe that this approach is not aligned with kubeStateMetricsCache and kubeletStatsCache approach where we try to solve a similar issue.
GetStateMetricsFamilies(prometheus p.Prometheus) ([]*p.MetricFamily, error) |
The watches need to be set in a module level. We need to take into account elastic-agent where each metricset starts as a new kubernetes module. So when we need to share stuff between metricsets of kubernetes module, we need to define a global cache on Module's level shared across all Module's instances.
See @ChrsMark implementation in #25640 (comment)
I think maybe we should move those new decisions to a new PR @gizas |
I think this approach still works this way and does basically the same thing. It is harder to use I added unit tests for every function, and they work just fine. @MichaelKatsoulis Edit: It is the same approach we are already using for |
You need to define a watchersCache in utils like we do with MetricsRepo
Then its pointer can be passed to NewResourceMetadataEnricher like metricsRepo. Did you test how many watchers are created under the hood with e2e tests? Did you test elastic-agent and metricbeat? |
Yes, the number of watchers are correct. I posted the results in Results in the description from running metricbeat. I also added the unit tests. EA was also tested, results are now in the description. @MichaelKatsoulis Any test in specific I should do? Any specific situation? |
💔 Build Failed
Failed CI StepsHistory
cc @constanca-m |
💚 Build Succeeded
History
cc @constanca-m |
💚 Build Succeeded
History
cc @constanca-m |
💚 Build Succeeded
History
cc @constanca-m |
💚 Build Succeeded
History
cc @constanca-m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding the documentation:
- I think it makes more sense to keep structs and function parameters details as comments in the code, otherwise there is high chances this doc will be not kept in sync with code changes. I believe it would be helpful to keep implementation details in the code, and
enrichers.md
for more high level overview - could the documentation be restructured to have sub-topics? it is hard to read 100+ lines of text as one piece
- could maybe be added a high level architecture diagram? It would simplify the documentation a lot
|
||
- The following description is irrelevant to the metadata enrichment that happens due to the `add_kubernetes_metadata` processor and Kubernetes provider. | ||
- The `add_kubernetes_metadata` processor is skipped in cases of Kubernetes module for metrics collection. It is only used in the case of logs collection when the Kubernetes autodiscover provider is not used. | ||
-The Kubernetes autodiscover provider enriches with metadata mainly when it comes to log collection when it is configured. It is by default in the `container_logs` integration in the Elastic Agent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not only
?
-The Kubernetes autodiscover provider enriches with metadata mainly when it comes to log collection when it is configured. It is by default in the `container_logs` integration in the Elastic Agent. | |
-The Kubernetes autodiscover provider enriches with metadata only when it comes to log collection when it is configured. It is by default in the `container_logs` integration in the Elastic Agent. |
maybe instead of referring to different project (Elastic Agent) - add link to the filebeat https://github.com/elastic/beats/blob/main/deploy/kubernetes/filebeat-kubernetes.yaml#L133-L150?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kubernetes provider is also used with templates to start a metricbeat module (https://www.elastic.co/guide/en/beats/metricbeat/current/configuration-autodiscover.html#configuration-autodiscover) and enrich the metrics with kubernetes metadata
like state_pod, state_container, node, state_job etc. | ||
- The shared watchers are stored in the module-level struct [resourceWatchers](https://github.com/constanca-m/beats/blob/8fcdfc205ebe0221ed7a1046c673c9babfb5280d/metricbeat/module/kubernetes/kubernetes.go#L102). The struct is initialized when the Kubernetes module [starts](https://github.com/constanca-m/beats/blob/8fcdfc205ebe0221ed7a1046c673c9babfb5280d/metricbeat/module/kubernetes/util/kubernetes.go#L124). | ||
- Each time a watcher gets created it is added to the struct along with some watcher important data. The [struct](https://github.com/constanca-m/beats/blob/8fcdfc205ebe0221ed7a1046c673c9babfb5280d/metricbeat/module/kubernetes/util/kubernetes.go#L99) consists of a metaWatchersMap which has as key the name of the resource this watcher is watching(i.e. pod or node) and as value a [metaWatcher](https://github.com/constanca-m/beats/blob/8fcdfc205ebe0221ed7a1046c673c9babfb5280d/metricbeat/module/kubernetes/util/kubernetes.go#L86C6-L86C17) struct. | ||
- The metaWatcher struct holds all the relevant info for the specific watcher. These info include |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move such detailed explanation on the metaWatcher
struct to the code, otherwise there is high chances this doc will be not kept in sync with code changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These details of the variables are already in the code like this:
watcher kubernetes.Watcher
started bool // true if watcher has started, false otherwise
metricsetsUsing []string // list of metricsets using this watcher
enrichers map[string]*enricher // map of enrichers using this watcher. The key is the metricset name
metadataObjects map[string]bool // map of ids of each object received by the handler functions
nodeScope bool // whether this watcher is only for current node
restartWatcher kubernetes.Watcher // whether this watcher needs a restart
- If it does not exist it creates it and [stores](https://github.com/constanca-m/beats/blob/8fcdfc205ebe0221ed7a1046c673c9babfb5280d/metricbeat/module/kubernetes/util/kubernetes.go#L288C3-L288C18) it in the `resourceWatchers` struct. When creating a new metaWatcher struct, thet `started` options set to false as the watcher has not been started yet(just created). The `metadataObjects`, `enrichers` and `metricsetsUsing` initiliazied to empty, the `restartWatcher` set to nil and the `nodeScope` according to the function input(except from cases when it is an extra watcher where it is hardcoded to false). | ||
- If the watcher for that resource already exists, we check if it needs to be restarted. This situation can arise in specific cases. For instance, if a pod watcher has been created from a metricset like pod or container with nodeScope set to true (watching only from one node), and then another metricset like state_pod or state_container tries to create a pod watcher (which can happen only in the leader node), the watcher already exists. However, if we don't take any action in this scenario, the state_pod watcher would use a watcher that watches only from one node, leading to missing metadata, as described earlier. To address this, we need to update the watcher, mainly changing its watch options (removing options.Node). Unfortunately, a running watcher cannot be updated directly. Instead, we must stop it and create a new one with the correct watch options. The new restartWatcher must be identical to the old watcher, including the same handler function (more on that later), with the only difference being in the watch options. Consequently, the metaWatcher struct of the watcher gets updated with the restartWatcher. The process of stopping the old watcher and starting the new one is handled later on. | ||
- After createAllWatchers function creates all the watchers that are needed and update the resourceWatchers struct, the code flow returns to NewResourceMetadataEnricher | ||
- Now, let's delve into the creation of metadata generators and handler functions. Watchers, on their own, are responsible for subscribing to the Kubernetes API and monitoring changes in the resources they are configured to watch. Their primary function is to update their internal cache or store. However, to determine what actions to take when a change occurs, we rely on the so-called event handlers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can maybe this list be structured somehow with sub-titles? for example metadata generation
can be split to another list?
// Enrich the given list of events | ||
Enrich([]mapstr.M) | ||
} | ||
|
||
type enricher struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add comments regarding this struct and the struct fields here, instead of in documentation - to keep the implementation details in code and the high level overview in the https://github.com/elastic/beats/blob/a7f709a153a66bdb5369128342bd21c36da6439e/metricbeat/module/kubernetes/util/enrichers.md instead ?
} else if resourceMetaWatcher.nodeScope != nodeScope && resourceMetaWatcher.nodeScope { | ||
// It might happen that the resourceMetaWatcher already exists, but is only being used to monitor the resources | ||
// of a single node. In that case, we need to check if we are trying to create a new resourceMetaWatcher that will track | ||
// the resources of multiple nodes. If it is the case, then we need to update the resourceMetaWatcher. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, we need to check if we are trying to create a new resourceMetaWatcher that will track the resources of multiple nodes.
it is not the multiple nodes
, but rather the whole cluster/clusterScope, right?
Such behavior is expected only on the leader node, right?
what will happen in case of the re-election, is it needed in that case to switch from clusterScope to nodeScope (now this condition would only work for the nodeScope: true
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not the multiple nodes, but rather the whole cluster/clusterScope, right?
You are right
what will happen in case of the re-election, is it needed in that case to switch from clusterScope to nodeScope
Under what conditions the re-election happens ? I think it only happens if an agent is down/restarted, so in that case the whole metricsets initialisation will take place again and the ones with clusterscope will not start at this node.
-The Kubernetes autodiscover provider enriches with metadata mainly when it comes to log collection when it is configured. It is by default in the `container_logs` integration in the Elastic Agent. | ||
- Metadata enrichment from the enrichers happens for all the following Kubernetes metricsets: `state_namespace`, `state_node`, `state_deployment`, `state_daemonset`, `state_replicaset`, `state_pod`, `state_container`, `state_job`, `state_cronjob`, `state_statefulset`, `state_service`, `state_persistentvolume`, `state_persistentvolumeclaim`, `state_storageclass`, `pod`, `container`, `node`. | ||
- The reason that these metricsets trigger the metadata enrichment is because of the way they start. | ||
- All `state_metricsets` (except `state_container`) trigger the shared [kubernetes.Init](https://github.com/constanca-m/beats/blob/8fcdfc205ebe0221ed7a1046c673c9babfb5280d/metricbeat/module/kubernetes/state_daemonset/state_daemonset.go#L45) function when they get initialized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost all the links are from the private repo - https://github.com/constanca-m/beats, that can be outdated quite soon or removed when the feature branch will be merged in
- The pod watcher event handler will only call the updateFunc of the one enricher it has, when triggered | ||
- A new pod appears, the watcher handler function gets triggered and it executes the updatefunc of only the pod metricset's enricher | ||
- So the pod metricset's enricher's metadata map will have the metadata for all pod resources | ||
- state_pod metricsets gets initiliazied. This happens in leader node. First pod metricsets starts and then state_pod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pod metricsets starts and then state_pod
does it always happen like this? or it is an example here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It always happens. The reason is that for the state_* metricsets to start , the leader election needs to take place first which takes sometime.
- A new enricher for state_pod gets created, it uses the same existing pod watcher. Inside buildMetadataEnricher the watcher.enrichers map gets updated to include the state_pod enricher as well | ||
- So whenever a new pod appears or gets updated, both enricher's updateFunc will be triggered, so both enrichers' metadata map will be up to date | ||
- But what happens with pods that triggered a watcher event, in between the pod and state_pod initilization. So after the pod metricsets got initiliazed and before the state_pod got initiliazed | ||
- This is very common. The moment a watcher starts watching(from pod metricset) it immidiately gets notified for all pods in the cluster and executes whatever its updateFunc says. In that case the pod enricher's updateFunc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case the pod enricher's updateFunc
this sentence is not finished, no?
- This is very common. The moment a watcher starts watching(from pod metricset) it immidiately gets notified for all pods in the cluster and executes whatever its updateFunc says. In that case the pod enricher's updateFunc | ||
- So when the watcher got updated with state_pod enricher, the events for the existing pods have already arrived and at that point the watcher did not call the state_pod's enricher updateFunc | ||
- Outcome is that the state_pod enricher's metadata map won't have the existing pods metadata, because the metadata generate method of that enricher was never called | ||
- So we need a way to handle those cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it an explanation of the restartWatcher
? this seems to be not finished
Co-authored-by: Tetiana Kravchenko <tanya.kravchenko.v@gmail.com>
This pull request is now in conflicts. Could you fix it? 🙏
|
- The reason that these metricsets trigger the metadata enrichment is because of the way they start. | ||
- All `state_metricsets` (except `state_container`) trigger the shared [kubernetes.Init](https://github.com/constanca-m/beats/blob/8fcdfc205ebe0221ed7a1046c673c9babfb5280d/metricbeat/module/kubernetes/state_daemonset/state_daemonset.go#L45) function when they get initialized. | ||
- [kubernetes.Init](https://github.com/constanca-m/beats/blob/8fcdfc205ebe0221ed7a1046c673c9babfb5280d/metricbeat/helper/kubernetes/state_metricset.go#L44) calls the [New metricsets method](https://github.com/constanca-m/beats/blob/8fcdfc205ebe0221ed7a1046c673c9babfb5280d/metricbeat/helper/kubernetes/state_metricset.go#L80), which returns a new metricset with resource metadata enricher as part of it. | ||
- Node, pod, container, and `state_container` metricsets do not trigger the `kubernetes.Init` rather they implement their own [New method](https://github.com/constanca-m/beats/blob/8fcdfc205ebe0221ed7a1046c673c9babfb5280d/metricbeat/module/kubernetes/node/node.go#L66) with the enricher as part of the metricsets returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all points up until this one are not necessary in this file, and they might add confusion. We just need to say the enrichers are created in the metricset struct. We talk a lot about the kubernetes.Init, and I think that code might change in the future, and then this file won't be updated. Can we remove all these points and just add a sentence explaining where we start enrichers? @MichaelKatsoulis
Co-authored-by: Tetiana Kravchenko <tanya.kravchenko.v@gmail.com>
// removeFromMetricsetsUsing removes the metricset from the list of resources using the shared watcher. | ||
// It returns true if element was removed and new size of array. | ||
// The cache should be locked when called. | ||
func removeFromMetricsetsUsing(resourceName string, notUsingName string, resourceWatchers *Watchers) (bool, int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inverse of this function addToMetricsetsUsing
grabs the lock and this function doesn't. That inconsistency is not clear. I would prefer to see that consistent between the two functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time we call this function, we are already holding the lock. That is why it is not consistent. I will add a comment to the addToMetricsetsUsing
mentioning the lock
Signed-off-by: constanca <constanca.manteigas@elastic.co>
Signed-off-by: constanca <constanca.manteigas@elastic.co>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the additional comment and cleanup of the go.mod file.
Signed-off-by: constanca <constanca.manteigas@elastic.co>
Proposed commit message
Details
The needed watchers for each resource are defined in
getExtraWatchers
function. You can check which are required in the table of section Expected watchers in this issue.We have a global map that saves all the watchers:
The key to this map is the resource name, and the values are defined as:
metricsetsUsing
contains the list of metricsets that are using this watcher. We need this because when the enricher callsStart()
orStop()
, the watchers start/stop. We cannot start a watcher more than once. We only stop a watcher if the list of metricsets using it is empty. We use metricset to avoid conflicts between metricsets that use the same resource, likestate_pod
andpod
.watcher
is the kubernetes watcher for the resource.started
just tells us if the watcher is started. This is mainly needed for theenricher.Start()
and for testing purposes.enrichers
is the list of enrichers for this watcher per each metricsetmetadataEvents
is the resulted metadata events from the resource event handler. Please see the next list, point 6.2, why this is necessaryThe algorithm goes like this when
NewResourceMetadataEnricher
is called:kubernetes.WatchOptions{}
needed for the watcher. If it fails, we stop.Considerations:
UpdateFunc
/addFunc
andDeleteFunc
, we need to save which metricsets and respective enrichers need that handler function. For this, we keep track of the enrichers using a map, and iterate over that map when one of the functions is triggered.AddFunc
is called for one metricset first, and when the other metricset starts, theAddFunc
is no longer triggered. To avoid the loss of metadata, we have the mapmetadataObjects
, that saves the id of the object that triggered the handler function. This way, for each enricher upon creation, we iterate over all this map and using theid
saved there, we get the object from the watcher store. Using this object, we call theupdate
function and ensure all enrichers have up to date metadata.Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
Related issues
Results
Metricbeat
These results come from running metricbeat with this configuration for kubernetes module (all metricsets that launch watchers are enabled, the others are not).
The logs for the watchers initialization will look like this (only
message
field displayed for simplicity):Notice the line with
<----------
: the pod was the resource that created the watcher for namespace, since it is one of the required resources, and it did not exist yet. This is also the reason why we don't see the line"message":"Created watcher state_namespace successfully, created by state_namespace."
, because by the timestate_namespace
is iterating over the needed watchers, they are already created.In Discover:
Elastic Agent
These results come from running EA with this standalone manifest, but with the custom image.
Logs:
These are the logs for starting the watchers (working as expected).
The results for the dashboards are (check if it still works):
Notes for testing
Important things to consider when testing this PR code changes:
state_namespace state_node state_deployment state_daemonset state_replicaset state_pod state_container state_job state_cronjob state_statefulset state_service state_persistentvolume state_persistentvolumeclaim state_storageclass pod container node
a. All events coming from the affected metricsets in Kibana should be enriched with own resource metadata (labels, annotations) and kubernetes node metadata and kubernetes namespace metadata when applicable.
b. When a new metadata(like a new label) is added on a resource(i.e. pod) then the new events from the related metricset(pod, container, state_pod, state_container) should contain the new metadata
c. When a new node or namespace label and annotation is added to a node/namespace, then the events from relevant metricsets(state_node, node or state_namespace) should include the new metadata.
d. The events of the rest of the metricsets(i.e. state_pod or state_deployment) coming from resources in the updated node/namespace won't get the updated node or namespace metadata out of the box.
e. In order for those events to be updated, there should be first an update in the metadata of these resources. For example if a node is labeled then the pods of that node won't get the new node label immediately. In order to get it, we should also add a label to these pod to trigger a watcher event. Then the new events will include the new pod label and node label
f. Test with addition/removal of metadata on pods that run on the leader node and also on the non-leader nodes.