-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: As a Korifi Operator I want to be able to understand the health of my Korifi System #3665
Comments
We believe that in the k8s world there are solutions (such as open telemetry) that would be much more superior and flexible to whatever we come up in Korifi. That is why we have always considered observability and telemetry out of scope for Korifi. Of course, Korifi should implement metrics endpoints as defined by the CF API (such as getting process stats) but anything outside of the specification should be probably achieved via k8s native and superior tools. We are open for a discussion, of course. If you are willing to spend some time yourself, you could come up with a proposal and why not PRs. You could also consider building a separate component that provides the metrics you see useful, and if you decide to opensource it, the community could benefit from your work. What do you think? cc @georgethebeatle @zabanov-lab |
I completely agree with this and there is no confusion or question on that aspect.
Exactly. The Korifi API needs a metrics endpoint that gives specific metrics like the CF API, primarily I believe many of the metrics that are emitted by the cloud controller for example would make sense in the korifi-api as well. Routing metrics is another such example but this could be mapped against the metrics coming off from contour and envoy most likely, but there would be custom metrics like for example Many of these metrics are present around in the metrics server and probably in envoy in its terms and conventions. So probably another aspect will also be to map them against the equivalent metrics that operators are used to using and seeing in the traditional CF Deployment. |
I don't think that we need everything that the cf-on-vms users are used to use, but we need a monitoring and operations guide for Korifi. I guess we have most of the things that we need buried somewhere deep down in Kubernetes, but we need to describe them, add context and meaning to them. That way, when we have proper documentation of what the metrics mean for the Korifi components, we can talk about monitoring and operational procedures. |
Could you point us to the metrics you refer to? Reading Accessing metrics Maybe the correct solution here is to implement the logcache api (as the cf cli currently assumes that it is there) for k8s in a separate component and just make korifi's |
Honestly, as of today we do not have an idea how to really implement observability properly and we (Korifi maintainers) do not have the capacity to explore it right now. However, any thoughts and proposals are welcome. |
What I meant was from the perspective of components and operational metrics like the one for cloud-controller and Routing: https://docs.cloudfoundry.org/running/all_metrics.html#cc Yea most of the component metrics of CF are no longer relevant here as they are going to be replaced by a bunch of controllers and CRDs but many of the metrics from these components were used for the operational aspects of the Landscapes. For e.g The diego metric about the total amount for example would have helped to understand if the current number of diego cells is enough...I guess a similar analogy here of course would be the worker nodes in the data plan. But that analogy needs to be built up and mapped. So that's what I meant from an operational aspect This will be an evolving topic, I understand that. its probably not the focus now. Added the ticket for future references. |
@vipinvkmenon I guess we'll have to combine the things we get with the k8s monitoring tools with the things that we get from the workloads themselves(either Korifi CRDs or CF apps) and add them some meaning in context of Korifi. I've done a comparison of |
I've been trying out the Custom Resource Metrics from kube-state-metrics. Seems like a candidate to start with. Atleast some of the metrics around |
Blockers/Dependencies
Currently, there are no metrics exposed from the Korifi API. Some metrics are from the
controllers
but not from the Korifi API Pod. II did not see a/metrics
for the Korifi API...I could be wrong and missing it as well...If so that port and endpoint please :)? )While I agree that the CF-Korifi Architecture is completely different from CF on Bosh, there would custom metrics (just like CF in the Bosh deployment) that would overlap, for eg.... Total LRPs (equivalent to total Pods), jobs, etc.
It would be ideal for these to be converted to CF-specific metrics, rather than getting it directly from the Kube Controller. Some metrics are
Background
As a CF-Operator
I want custom metric that are specific to CF rather than manually mapping or using all the generic metrics of Kubernetes
So that I can understand the overall health of my CF as a
Platform
.Acceptance Criteria
GIVEN Korifi Deployment
WHEN I query /metrics of the Korifi API Pod
THEN I see the custom metrics that are specific to CF.
Dev Notes
No response
The text was updated successfully, but these errors were encountered: