Skip to content

Commit

Permalink
Adjust template / match rules for tail-logs (#398)
Browse files Browse the repository at this point in the history
  • Loading branch information
stewartshea authored Jul 15, 2024
1 parent 4ea7a15 commit 5ab13a5
Show file tree
Hide file tree
Showing 10 changed files with 43 additions and 24 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
Troubleshooting Tasks in Codecollection: **178**
Codebundles in Codecollection: **66**


![](docs/GitHub_Banner.jpg)

<p align="center">
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@ spec:
- type: slo
- type: runbook
templateName: k8s-namespace-healthcheck-taskset.yaml
# - type: workflow
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,9 @@ spec:
- name: DISTRIBUTION
value: Kubernetes
- name: EVENT_THRESHOLD
value: '0'
value: '3'
- name: CONTAINER_RESTART_THRESHOLD
value: '0'
value: '2'
secretsProvided:
- name: kubeconfig
workspaceKey: {{custom.kubeconfig_secret_name}}
Original file line number Diff line number Diff line change
Expand Up @@ -25,18 +25,12 @@ spec:
value: {{custom.kubernetes_distribution_binary}}
- name: NAMESPACE
value: {{match_resource.resource.metadata.name}}
- name: ERROR_PATTERN
value: (Error|Exception)
- name: CONTEXT
value: {{context}}
- name: SERVICE_ERROR_PATTERN
value: (Error:)
- name: SERVICE_EXCLUDE_PATTERN
value: (node_modules|opentelemetry)
- name: ANOMALY_THRESHOLD
value: "3.0"
- name: EVENT_AGE
value: "30"
value: "5m"
secretsProvided:
- name: kubeconfig
workspaceKey: {{custom.kubeconfig_secret_name}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
apiVersion: runwhen.com/v1
kind: Workflow
metadata:
name: {{slx_name}}-ns-alert-workflow
labels:
{% include "common-labels.yaml" %}
annotations:
{% include "common-annotations.yaml" %}
spec:
fromActivities:
- displayName: {{namespace.name}} Namespace SLO Alert Workflow
description: Start RunSession with Eager Edgar when SLO is alerting for {{namespace.name}} namespace health
actions:
- tasks:
slx: {{slx_name.split('--')[1]}}
persona: eager-edgar
titles:
- Inspect Warning Events in Namespace `${NAMESPACE}`
- Inspect Container Restarts In Namespace `${NAMESPACE}`
- Inspect Pending Pods In Namespace `${NAMESPACE}`
- Inspect Failed Pods In Namespace `${NAMESPACE}`
- Inspect Workload Status Conditions In Namespace `${NAMESPACE}`
- Check Event Anomalies in Namespace `${NAMESPACE}`
- Check Resource Quota Utilization in Namespace `${NAMESPACE}`
match:
activityVerbs:
- ALERTS_STARTED
slxs:
- {{slx_name.split('--')[1]}}
name: {{slx_name.split('--')[1]}}-slo-alert-workflow
7 changes: 0 additions & 7 deletions codebundles/k8s-namespace-healthcheck/runbook.robot
Original file line number Diff line number Diff line change
Expand Up @@ -497,12 +497,6 @@ Suite Initialization
... description=Which Kubernetes context to operate within.
... pattern=\w*
... example=my-main-cluster
${ERROR_PATTERN}= RW.Core.Import User Variable ERROR_PATTERN
... type=string
... description=The error pattern to use when grep-ing logs.
... pattern=\w*
... example=(Error|Exception)
... default=(Error|Exception)
${ANOMALY_THRESHOLD}= RW.Core.Import User Variable
... ANOMALY_THRESHOLD
... type=string
Expand Down Expand Up @@ -533,7 +527,6 @@ Suite Initialization
Set Suite Variable ${KUBERNETES_DISTRIBUTION_BINARY} ${KUBERNETES_DISTRIBUTION_BINARY}
Set Suite Variable ${NAMESPACE} ${NAMESPACE}
Set Suite Variable ${EVENT_AGE} ${EVENT_AGE}
Set Suite Variable ${ERROR_PATTERN} ${ERROR_PATTERN}
Set Suite Variable ${ANOMALY_THRESHOLD} ${ANOMALY_THRESHOLD}
Set Suite Variable ${HOME} ${HOME}
Set Suite Variable
Expand Down
4 changes: 2 additions & 2 deletions codebundles/k8s-namespace-healthcheck/sli.robot
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Suite Initialization
... description=The maximum total events to be still considered healthy.
... pattern=^\d+$
... example=2
... default=0
... default=2
${CONTAINER_RESTART_AGE}= RW.Core.Import User Variable CONTAINER_RESTART_AGE
... type=string
... description=The time window in minutes as search for container restarts.
Expand All @@ -50,7 +50,7 @@ Suite Initialization
... description=The maximum total container restarts to be still considered healthy.
... pattern=^\d+$
... example=2
... default=0
... default=3
${KUBERNETES_DISTRIBUTION_BINARY}= RW.Core.Import User Variable KUBERNETES_DISTRIBUTION_BINARY
... type=string
... description=Which binary to use for Kubernetes CLI commands.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@ spec:
- type: and
matches:
- type: pattern
pattern: "kubectl.kubernetes.io//default-container"
properties: ["spec/template/metadata/annotations"]
pattern: ".+"
properties: ["spec/template/metadata/annotations/kubectl.kubernetes.io//default-container"]
mode: substring
- type: pattern
pattern: "codecollection.runwhen.com//app"
properties: [annotations]
pattern: "codecollection.runwhen.com/app"
properties: [labels]
mode: substring
slxs:
- baseName: k8s-tail-logs-dynamic
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ spec:
- name: LOGS_SINCE
value: 10m
- name: LABELS
value: app={{match_resource.resource.metadata.labels.app}}
value: codecollection.runwhen.com/app={{match_resource.resource.metadata.labels.get('codecollection.runwhen.com/app')}}
- name: EXCLUDE_PATTERN
value: INFO
- name: CONTAINER_NAME
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ spec:
- name: LOGS_SINCE
value: 10m
- name: LABELS
value: app={{match_resource.resource.metadata.labels.app}}
value: codecollection.runwhen.com/app={{match_resource.resource.metadata.labels.get('codecollection.runwhen.com/app')}}
- name: EXCLUDE_PATTERN
value: INFO
- name: CONTAINER_NAME
Expand Down

0 comments on commit 5ab13a5

Please sign in to comment.