-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
OpenTelemetry Collector Codebundle (#406)
* implementation * script cleanup and genrule * Update nextsteps
- Loading branch information
Showing
8 changed files
with
299 additions
and
0 deletions.
There are no files selected for viewing
28 changes: 28 additions & 0 deletions
28
codebundles/k8s-otelcollector/.runwhen/generation-rules/k8s-otelcollector.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
apiVersion: runwhen.com/v1 | ||
kind: GenerationRules | ||
spec: | ||
generationRules: | ||
- resourceTypes: | ||
- deployment | ||
- daemonset | ||
- statefulset | ||
matchRules: | ||
- type: and | ||
matches: | ||
- type: pattern | ||
pattern: "opentelemetry-collector" | ||
properties: [label-values] | ||
mode: substring | ||
- type: pattern | ||
pattern: "col" | ||
properties: [name] | ||
mode: substring | ||
slxs: | ||
- baseName: k8s-otelcollector | ||
levelOfDetail: detailed | ||
qualifiers: ["resource", "namespace", "cluster"] | ||
baseTemplateName: k8s-otelcollector | ||
outputItems: | ||
- type: slx | ||
- type: runbook | ||
templateName: k8s-otelcollector-taskset.yaml |
38 changes: 38 additions & 0 deletions
38
codebundles/k8s-otelcollector/.runwhen/templates/k8s-otelcollector-taskset.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
apiVersion: runwhen.com/v1 | ||
kind: Runbook | ||
metadata: | ||
name: {{slx_name}} | ||
labels: | ||
{% include "common-labels.yaml" %} | ||
annotations: | ||
{% include "common-annotations.yaml" %} | ||
spec: | ||
location: {{default_location}} | ||
codeBundle: | ||
{% if repo_url %} | ||
repoUrl: {{repo_url}} | ||
{% else %} | ||
repoUrl: https://github.com/runwhen-contrib/rw-cli-codecollection.git | ||
{% endif %} | ||
{% if ref %} | ||
ref: {{ref}} | ||
{% else %} | ||
ref: main | ||
{% endif %} | ||
pathToRobot: codebundles/k8s-jaeger-http-query/runbook.robot | ||
configProvided: | ||
- name: KUBERNETES_DISTRIBUTION_BINARY | ||
value: {{custom.kubernetes_distribution_binary}} | ||
- name: NAMESPACE | ||
value: {{match_resource.resource.metadata.namespace}} | ||
- name: CONTEXT | ||
value: {{context}} | ||
- name: WORKLOAD_NAME | ||
value: {{match_resource.resource.kind}}/{{match_resource.resource.metadata.name}} | ||
- name: WORKLOAD_SERVICE | ||
value: {{match_resource.resource.metadata.name}} | ||
- name: METRICS_PORT | ||
value: 8888 | ||
secretsProvided: | ||
- name: kubeconfig | ||
workspaceKey: {{custom.kubeconfig_secret_name}} |
23 changes: 23 additions & 0 deletions
23
codebundles/k8s-otelcollector/.runwhen/templates/k8s-otelcollector.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
apiVersion: runwhen.com/v1 | ||
kind: ServiceLevelX | ||
metadata: | ||
name: {{slx_name}} | ||
labels: | ||
{% include "common-labels.yaml" %} | ||
annotations: | ||
{% include "common-annotations.yaml" %} | ||
spec: | ||
imageURL: https://storage.googleapis.com/runwhen-nonprod-shared-images/icons/jaeger_tracing.svg | ||
alias: OTEL Collector Health for Namespace {{match_resource.resource.metadata.namespace}} | ||
asMeasuredBy: None | ||
configProvided: | ||
- name: OBJECT_NAME | ||
value: {{match_resource.resource.metadata.name}} | ||
owners: | ||
- {{workspace.owner_email}} | ||
statement: OTEL Collector {{match_resource.resource.metadata.name}} should not have large queues or error logs. | ||
additionalContext: | ||
namespace: "{{match_resource.resource.metadata.namespace}}" | ||
labelMap: "{{match_resource.resource.metadata.labels}}" | ||
cluster: "{{ cluster.name }}" | ||
context: "{{ cluster.context }}" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Kubernetes OpenTelemetry Health Check | ||
Checks the OTEL collector's logs and metrics to determine its health, such as large queues or errors. | ||
|
||
Note: if you're having trouble connecting to your otel collector, change the | ||
deployment name to another workload in the namespace | ||
|
||
## Tasks | ||
`Scan OpenTelemetry Logs For Dropped Spans In Namespace ` | ||
|
||
`Check OpenTelemetry Collector Logs For Errors In Namespace` | ||
|
||
`Query Collector Queued Spans in Namespace` | ||
|
||
## Configuration | ||
The TaskSet requires initialization to import necessary secrets, services, and user variables. The following variables should be set: | ||
|
||
- `kubeconfig`: The kubeconfig secret containing access info for the cluster. | ||
- `KUBERNETES_DISTRIBUTION_BINARY`: Which binary to use for Kubernetes CLI commands. Default value is `kubectl`. | ||
- `CONTEXT`: The Kubernetes context to operate within. | ||
- `NAMESPACE`: The name of the namespace to search. Leave it blank to search in all namespaces. | ||
- `WORKLOAD_SERVICE`: Service name to curl against for metrics. | ||
- `WORKLOAD_NAME`: Workload used for exec requests. | ||
- `METRICS_PORT`: The port to use to request metrics from. | ||
|
||
|
||
## Requirements | ||
- A kubeconfig with appropriate RBAC permissions to perform the desired command. | ||
|
||
## TODO | ||
- [ ] Consider additional tasks | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
#!/bin/bash | ||
|
||
# ENV: | ||
# CONTEXT | ||
# NAMESPACE | ||
# METRICS_PORT | ||
# WORKLOAD_NAME | ||
# WORKLOAD_SERVICE | ||
since=60m | ||
output=$(kubectl --context $CONTEXT -n $NAMESPACE logs service/$WORKLOAD_SERVICE --since=$since --all-containers=true | grep dropped -A 20) | ||
if [ -n "$output" ]; then | ||
echo -E "Dropped Spans Found:" | ||
echo -E "$output" | ||
exit 1 | ||
fi | ||
exit 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
#!/bin/bash | ||
|
||
# ENV: | ||
# CONTEXT | ||
# NAMESPACE | ||
# METRICS_PORT | ||
# WORKLOAD_NAME | ||
# WORKLOAD_SERVICE | ||
since=60m | ||
output=$(kubectl --context $CONTEXT -n $NAMESPACE logs service/$WORKLOAD_SERVICE --since=$since --all-containers=true | grep error) | ||
if [ -n "$output" ]; then | ||
echo -E "Error(s) Found:" | ||
echo -E "$output" | ||
exit 1 | ||
fi | ||
exit 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
#!/bin/bash | ||
|
||
# ENV: | ||
# CONTEXT | ||
# NAMESPACE | ||
# METRICS_PORT | ||
# WORKLOAD_NAME | ||
# WORKLOAD_SERVICE | ||
|
||
THRESHOLD=500 | ||
rv=0 | ||
metrics=$(kubectl --context $CONTEXT -n $NAMESPACE exec $WORKLOAD_NAME -- curl $WORKLOAD_SERVICE:$METRICS_PORT/metrics) | ||
queued_spans=$(echo -E "$metrics" | grep "otelcol_exporter_queue_size{") | ||
while IFS= read -r line; do | ||
echo "$line" | ||
value=$(echo "$line" | awk '{print $2}') | ||
if [ "$value" -gt "$THRESHOLD" ]; then | ||
echo "Error: queued spans ($value) exceeds threshold ($THRESHOLD)" | ||
rv=1 | ||
|
||
fi | ||
done <<< "$queued_spans" | ||
exit $rv |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
*** Settings *** | ||
Documentation This taskset performs diagnostic checks on a OpenTelemetry Collector to ensure it's pushing metrics. | ||
Metadata Author jon-funk | ||
Metadata Display Name K8s OpenTelemetry Collector Health | ||
Metadata Supports GKE EKS AKS Kubernetes OpenTelemetry otel collector | ||
|
||
Library BuiltIn | ||
Library RW.Core | ||
Library RW.CLI | ||
Library RW.platform | ||
|
||
Suite Setup Suite Initialization | ||
|
||
|
||
*** Tasks *** | ||
Query Collector Queued Spans in Namespace `${NAMESPACE}` | ||
[Documentation] Query the collector metrics endpoint and inspect queue size | ||
[Tags] otel collector metrics queued back pressure | ||
${process}= RW.CLI.Run Bash File | ||
... bash_file=otel_metrics_check.sh | ||
... env=${env} | ||
... secret_file__kubeconfig=${kubeconfig} | ||
... timeout_seconds=180 | ||
... include_in_history=false | ||
IF ${process.returncode} > 0 | ||
RW.Core.Add Issue title=OpenTelemetry Span Queue Growing | ||
... severity=3 | ||
... next_steps=Check OpenTelemetry backend is available in `${NAMESPACE}` and that the collector has enough resources, and that the collector's configmap is up-to-date. | ||
... expected=Queue size for spans should not be past threshold of 500 | ||
... actual=Queue size of 500 or larger found | ||
... reproduce_hint=Run otel_metrics_check.sh | ||
... details=${process.stdout} | ||
END | ||
RW.Core.Add Pre To Report ${process.stdout}\n | ||
|
||
Check OpenTelemetry Collector Logs For Errors In Namespace `${NAMESPACE}` | ||
[Documentation] Fetch logs and check for errors | ||
[Tags] otel collector metrics errors logs | ||
${process}= RW.CLI.Run Bash File | ||
... bash_file=otel_error_check.sh | ||
... env=${env} | ||
... secret_file__kubeconfig=${kubeconfig} | ||
... timeout_seconds=180 | ||
... include_in_history=false | ||
IF ${process.returncode} > 0 | ||
RW.Core.Add Issue title=OpenTelemetry Collector Has Error Logs | ||
... severity=3 | ||
... next_steps=Tail OpenTelemetry Collector Logs In Namespace `${NAMESPACE}` For Stacktraces | ||
... expected=Logs do not contain errors | ||
... actual=Found error logs | ||
... reproduce_hint=Run otel_error_check.sh | ||
... details=${process.stdout} | ||
END | ||
RW.Core.Add Pre To Report ${process.stdout}\n | ||
|
||
Scan OpenTelemetry Logs For Dropped Spans In Namespace `${NAMESPACE}` | ||
[Documentation] Query the collector logs for dropped spans from errors | ||
[Tags] otel collector metrics errors logs dropped rejected | ||
${process}= RW.CLI.Run Bash File | ||
... bash_file=otel_dropped_check.sh | ||
... env=${env} | ||
... secret_file__kubeconfig=${kubeconfig} | ||
... timeout_seconds=180 | ||
... include_in_history=false | ||
IF ${process.returncode} > 0 | ||
RW.Core.Add Issue title=OpenTelemetry Collector Logs Have Dropped Spans | ||
... severity=3 | ||
... next_steps=Tail OpenTelemetry Collector Logs In Namespace `${NAMESPACE}` For Stacktraces | ||
... expected=Logs do not contain dropped span entries | ||
... actual=Found dropped span entries | ||
... reproduce_hint=Run otel_dropped_check.sh | ||
... details=${process.stdout} | ||
END | ||
RW.Core.Add Pre To Report ${process.stdout}\n | ||
|
||
*** Keywords *** | ||
Suite Initialization | ||
${kubeconfig}= RW.Core.Import Secret | ||
... kubeconfig | ||
... type=string | ||
... description=The kubernetes kubeconfig yaml containing connection configuration used to connect to cluster(s). | ||
... pattern=\w* | ||
... example=For examples, start here https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/ | ||
${NAMESPACE}= RW.Core.Import User Variable NAMESPACE | ||
... type=string | ||
... description=The name of the Kubernetes namespace to scope actions and searching to. | ||
... pattern=\w* | ||
... example=my-namespace | ||
${CONTEXT}= RW.Core.Import User Variable CONTEXT | ||
... type=string | ||
... description=Which Kubernetes context to operate within. | ||
... pattern=\w* | ||
... example=my-main-cluster | ||
${KUBERNETES_DISTRIBUTION_BINARY}= RW.Core.Import User Variable KUBERNETES_DISTRIBUTION_BINARY | ||
... type=string | ||
... description=Which binary to use for Kubernetes CLI commands. | ||
... enum=[kubectl,oc] | ||
... example=kubectl | ||
... default=kubectl | ||
${WORKLOAD_SERVICE}= RW.Core.Import User Variable WORKLOAD_SERVICE | ||
... type=string | ||
... description=The service name used to curl the otel collector metrics endpoint. | ||
... example=otel-demo-otelcol | ||
... default=otel-demo-otelcol | ||
${WORKLOAD_NAME}= RW.Core.Import User Variable WORKLOAD_NAME | ||
... type=string | ||
... description=The workload name to act as a bastion-host. The collector can be used, or a bastion host depending on networking requirements. | ||
... example=deployment/otel-demo-otelcol | ||
... default=deployment/otel-demo-otelcol | ||
${METRICS_PORT}= RW.Core.Import User Variable METRICS_PORT | ||
... type=string | ||
... description=The port used by the collector to serve its metrics at. This will be scraped. | ||
... example=8888 | ||
... default=8888 | ||
Set Suite Variable ${kubeconfig} ${kubeconfig} | ||
Set Suite Variable ${CONTEXT} ${CONTEXT} | ||
Set Suite Variable ${KUBERNETES_DISTRIBUTION_BINARY} ${KUBERNETES_DISTRIBUTION_BINARY} | ||
Set Suite Variable ${NAMESPACE} ${NAMESPACE} | ||
Set Suite Variable ${WORKLOAD_SERVICE} ${WORKLOAD_SERVICE} | ||
Set Suite Variable ${WORKLOAD_NAME} ${WORKLOAD_NAME} | ||
Set Suite Variable ${METRICS_PORT} ${METRICS_PORT} | ||
Set Suite Variable | ||
... ${env} | ||
... {"KUBECONFIG":"./${kubeconfig.key}", "KUBERNETES_DISTRIBUTION_BINARY":"${KUBERNETES_DISTRIBUTION_BINARY}", "CONTEXT":"${CONTEXT}", "NAMESPACE":"${NAMESPACE}", "METRICS_PORT":"${METRICS_PORT}", "WORKLOAD_NAME":"${WORKLOAD_NAME}", "WORKLOAD_SERVICE":"${WORKLOAD_SERVICE}"} |