diff --git a/content/master/guides/metrics.md b/content/master/guides/metrics.md new file mode 100644 index 00000000..d46bff2c --- /dev/null +++ b/content/master/guides/metrics.md @@ -0,0 +1,58 @@ +--- +title: Metrics +weight: 60 +description: "Metrics are essential for monitoring Crossplane's operations, helping to quickly identify and resolve potential issues." +--- + +Crossplane produces [Prometheus style metrics](https://prometheus.io/docs/introduction/overview/#what-are-metrics) for effective monitoring and alerting in your environment. +These metrics are essential for helping to identify and resolve potential issues. +This page offers explanations of all these metrics gathered from Crossplane. +Understanding these metrics helps you maintain the health and performance of your resources. +Please note that this document focuses on Crossplane specific metrics and doesn't cover standard Go metrics. + +To enable the export of metrics it's necessary to configure the `--set metrics.enabled=true` option in the [helm chart](https://github.com/crossplane/crossplane/blob/main/cluster/charts/crossplane/README.md#configuration). +```yaml {label="value",copy-lines="none"} +metrics: + enabled: true +``` + +These Prometheus annotations expose the metrics: +```yaml {label="deployment",copy-lines="none"} +prometheus.io/path: /metrics +prometheus.io/port: "8080" +prometheus.io/scrape: "true" +``` + +{{< table "table table-hover table-striped table-sm">}} +| Metric Name | Description | Further Explanation | +| --- | --- | --- | +| {{}}certwatcher_read_certificate_errors_total{{}} | Total number of certificate read errors | | +| {{}}certwatcher_read_certificate_total{{}} | Total number of certificate reads | | +| {{}}composition_run_function_seconds_bucket{{}} | Histogram of RunFunctionResponse latency (seconds) | | +| {{}}controller_runtime_active_workers{{}} | Number of used workers per controller | The number of threads processing jobs from the work queue. | +| {{}}controller_runtime_max_concurrent_reconciles{{}} | Maximum number of concurrent reconciles per controller | Describes how reconciles can happen in parallel. | +| {{}}controller_runtime_reconcile_errors_total{{}} | Total number of reconciliation errors per controller | A counter that counts reconcile errors. Sharp or non stop rising of this metric might be a problem. | +| {{}}controller_runtime_reconcile_time_seconds_bucket{{}} | Length of time per reconciliation per controller | | +| {{}}controller_runtime_reconcile_total{{}} | Total number of reconciliations per controller | | +| {{}}controller_runtime_webhook_latency_seconds_bucket{{}} | Histogram of the latency of processing admission requests | | +| {{}}controller_runtime_webhook_requests_in_flight{{}} | Current number of admission requests served | | +| {{}}controller_runtime_webhook_requests_total{{}} | Total number of admission requests by HTTP status code | | +| {{}}rest_client_requests_total{{}} | Number of HTTP requests, partitioned by status code, method, and host | | +| {{}}workqueue_adds_total{{}} | Total number of adds handled by `workqueue` | | +| {{}}workqueue_depth{{}} | Current depth of `workqueue` | | +| {{}}workqueue_longest_running_processor_seconds{{}} | The number of seconds has the longest running processor for `workqueue` been running | | +| {{}}workqueue_queue_duration_seconds_bucket{{}} | How long in seconds an item stays in `workqueue` before requested | The time it takes from the moment a job enter the `workqueue` until the processing of this job starts. | +| {{}}workqueue_retries_total{{}} | Total number of retries handled by `workqueue` | | +| {{}}workqueue_unfinished_work_seconds{{}} | The number of seconds of work done that's in progress and hasn't observed by `work_duration`. Large values means stuck threads. | | +| {{}}workqueue_work_duration_seconds_bucket{{}} | How long in seconds processing an item from `workqueue` takes | The time it takes from the moment the job start until it finish (either successfully or with an error). | +| {{}}crossplane_managed_resource_exists{{}} | The number of managed resources that exist | | +| {{}}crossplane_managed_resource_ready{{}} | The number of managed resources in `Ready=True` state | | +| {{}}crossplane_managed_resource_synced{{}} | The number of managed resources in `Synced=True` state | | +| {{}}upjet_resource_ext_api_duration_bucket{{}} | Measures in seconds how long it takes a Cloud SDK call to complete | | +| {{}}upjet_resource_external_api_calls_total{{}} | The number of external API calls | The number of calls to cloud providers, with labels describing the endpoints resources. | +| {{}}upjet_resource_reconcile_delay_seconds_bucket{{}} | Measures in seconds how long the reconciles for a resource delay from the configured poll periods | | +| {{}}crossplane_managed_resource_deletion_seconds_bucket{{}} | The time it took to delete a managed resource | | +| {{}}crossplane_managed_resource_first_time_to_readiness_seconds_bucket{{}} | The time it took for a managed resource to become ready first time after creation | | +| {{}}crossplane_managed_resource_first_time_to_reconcile_seconds_bucket{{}} | The time it took to detect a managed resource by the controller | | +| {{}}upjet_resource_ttr_bucket{{}} | Measures in seconds the `time-to-readiness` `(TTR)` for managed resources | | +{{}} \ No newline at end of file diff --git a/content/v1.17/guides/metrics.md b/content/v1.17/guides/metrics.md new file mode 100644 index 00000000..d46bff2c --- /dev/null +++ b/content/v1.17/guides/metrics.md @@ -0,0 +1,58 @@ +--- +title: Metrics +weight: 60 +description: "Metrics are essential for monitoring Crossplane's operations, helping to quickly identify and resolve potential issues." +--- + +Crossplane produces [Prometheus style metrics](https://prometheus.io/docs/introduction/overview/#what-are-metrics) for effective monitoring and alerting in your environment. +These metrics are essential for helping to identify and resolve potential issues. +This page offers explanations of all these metrics gathered from Crossplane. +Understanding these metrics helps you maintain the health and performance of your resources. +Please note that this document focuses on Crossplane specific metrics and doesn't cover standard Go metrics. + +To enable the export of metrics it's necessary to configure the `--set metrics.enabled=true` option in the [helm chart](https://github.com/crossplane/crossplane/blob/main/cluster/charts/crossplane/README.md#configuration). +```yaml {label="value",copy-lines="none"} +metrics: + enabled: true +``` + +These Prometheus annotations expose the metrics: +```yaml {label="deployment",copy-lines="none"} +prometheus.io/path: /metrics +prometheus.io/port: "8080" +prometheus.io/scrape: "true" +``` + +{{< table "table table-hover table-striped table-sm">}} +| Metric Name | Description | Further Explanation | +| --- | --- | --- | +| {{}}certwatcher_read_certificate_errors_total{{}} | Total number of certificate read errors | | +| {{}}certwatcher_read_certificate_total{{}} | Total number of certificate reads | | +| {{}}composition_run_function_seconds_bucket{{}} | Histogram of RunFunctionResponse latency (seconds) | | +| {{}}controller_runtime_active_workers{{}} | Number of used workers per controller | The number of threads processing jobs from the work queue. | +| {{}}controller_runtime_max_concurrent_reconciles{{}} | Maximum number of concurrent reconciles per controller | Describes how reconciles can happen in parallel. | +| {{}}controller_runtime_reconcile_errors_total{{}} | Total number of reconciliation errors per controller | A counter that counts reconcile errors. Sharp or non stop rising of this metric might be a problem. | +| {{}}controller_runtime_reconcile_time_seconds_bucket{{}} | Length of time per reconciliation per controller | | +| {{}}controller_runtime_reconcile_total{{}} | Total number of reconciliations per controller | | +| {{}}controller_runtime_webhook_latency_seconds_bucket{{}} | Histogram of the latency of processing admission requests | | +| {{}}controller_runtime_webhook_requests_in_flight{{}} | Current number of admission requests served | | +| {{}}controller_runtime_webhook_requests_total{{}} | Total number of admission requests by HTTP status code | | +| {{}}rest_client_requests_total{{}} | Number of HTTP requests, partitioned by status code, method, and host | | +| {{}}workqueue_adds_total{{}} | Total number of adds handled by `workqueue` | | +| {{}}workqueue_depth{{}} | Current depth of `workqueue` | | +| {{}}workqueue_longest_running_processor_seconds{{}} | The number of seconds has the longest running processor for `workqueue` been running | | +| {{}}workqueue_queue_duration_seconds_bucket{{}} | How long in seconds an item stays in `workqueue` before requested | The time it takes from the moment a job enter the `workqueue` until the processing of this job starts. | +| {{}}workqueue_retries_total{{}} | Total number of retries handled by `workqueue` | | +| {{}}workqueue_unfinished_work_seconds{{}} | The number of seconds of work done that's in progress and hasn't observed by `work_duration`. Large values means stuck threads. | | +| {{}}workqueue_work_duration_seconds_bucket{{}} | How long in seconds processing an item from `workqueue` takes | The time it takes from the moment the job start until it finish (either successfully or with an error). | +| {{}}crossplane_managed_resource_exists{{}} | The number of managed resources that exist | | +| {{}}crossplane_managed_resource_ready{{}} | The number of managed resources in `Ready=True` state | | +| {{}}crossplane_managed_resource_synced{{}} | The number of managed resources in `Synced=True` state | | +| {{}}upjet_resource_ext_api_duration_bucket{{}} | Measures in seconds how long it takes a Cloud SDK call to complete | | +| {{}}upjet_resource_external_api_calls_total{{}} | The number of external API calls | The number of calls to cloud providers, with labels describing the endpoints resources. | +| {{}}upjet_resource_reconcile_delay_seconds_bucket{{}} | Measures in seconds how long the reconciles for a resource delay from the configured poll periods | | +| {{}}crossplane_managed_resource_deletion_seconds_bucket{{}} | The time it took to delete a managed resource | | +| {{}}crossplane_managed_resource_first_time_to_readiness_seconds_bucket{{}} | The time it took for a managed resource to become ready first time after creation | | +| {{}}crossplane_managed_resource_first_time_to_reconcile_seconds_bucket{{}} | The time it took to detect a managed resource by the controller | | +| {{}}upjet_resource_ttr_bucket{{}} | Measures in seconds the `time-to-readiness` `(TTR)` for managed resources | | +{{}} \ No newline at end of file