Skip to content

Commit

Permalink
Refactor Stacktrace Parsing Codebundles (#396)
Browse files Browse the repository at this point in the history
* parser modes

* update regexs in base and python parsers, add tests

* Golang file lines from stacktraces

* update parsers

* sample django variant

* app cb title updates

* add django json variant

* fix none edgecase

* fix keywords

* update parsers and template formatting

* runbook cleanup

* fix log call

* Update to use labels so all pods are queried

* fix sli labels

* Fix keyword

* empty report and detail lookup

* Expand data ingestion amount

* update parser for golang json logging

* Add golang json codebundle and parse keyword

* Update parsers to reduce noise in reports

* explicit mcst set

* Fix nones in joins

* fix cmd call

* implement dynamic parse

* Update dynamic sli

* Adjust metadata for dynamic

* syntax fix

* syntax fix

* debug

* add helper return

* 0 result fix

* line str helper change

* duplicate bespoke parser cb cleanup

* genrules

* template tweak

* mcst sanity check

* use knative annotation for genrule

* addressing code review
  • Loading branch information
jon-funk authored Jul 11, 2024
1 parent 315b4e0 commit d921ef1
Show file tree
Hide file tree
Showing 22 changed files with 1,272 additions and 101 deletions.
12 changes: 6 additions & 6 deletions codebundles/k8s-app-troubleshoot/runbook.robot
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Scan `${CONTAINER_NAME}` Application For Misconfigured Environment
RW.Core.Add Pre To Report Stdout:\n\n${script_run.stdout}
RW.Core.Add Pre To Report Commands Used: ${history}

Troubleshoot `${CONTAINER_NAME}` Application Logs
Tail `${CONTAINER_NAME}` Application Logs For Stacktraces
[Documentation] Performs an inspection on container logs for exceptions/stacktraces, parsing them and attempts to find relevant source code information
[Tags]
... application
Expand Down Expand Up @@ -84,7 +84,7 @@ Troubleshoot `${CONTAINER_NAME}` Application Logs
# ${test_data}= RW.K8sApplications.Get Test Data
${proc_list}= RW.K8sApplications.Format Process List ${proc_list.stdout}
# ${serialized_env}= RW.K8sApplications.Serialize env ${printenv.stdout}
${parsed_exceptions}= RW.K8sApplications.Parse Exceptions ${logs.stdout}
${parsed_exceptions}= RW.K8sApplications.Parse Stacktraces ${logs.stdout}
# ${parsed_exceptions}= RW.K8sApplications.Parse Exceptions ${test_data}
${repos}= Create List ${app_repo}
${ts_results}= RW.K8sApplications.Troubleshoot Application
Expand All @@ -111,11 +111,11 @@ Troubleshoot `${CONTAINER_NAME}` Application Logs
IF (len($parsed_exceptions)) > 0
RW.Core.Add Issue
... severity=3
... expected=No exceptions were found in the application logs of ${CONTAINER_NAME}
... actual=Found exceptions in the application logs of ${CONTAINER_NAME}
... expected=No stacktraces were found in the application logs of ${CONTAINER_NAME}
... actual=Found stacktraces in the application logs of ${CONTAINER_NAME}
... reproduce_hint=Run:\n${cmd}\n view logs results for exceptions.
... title=Application Exceptions detected in ${CONTAINER_NAME}
... details=This exception prompted the creation of a GitHub issue: ${most_common_exception}
... title=Application Stacktraces Detected In `${CONTAINER_NAME}`
... details=This stacktrace prompted the creation of a GitHub issue: ${most_common_exception}
... next_steps=${nextsteps}
END

Expand Down
2 changes: 1 addition & 1 deletion codebundles/k8s-app-troubleshoot/sli.robot
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Measure Application Exceptions
... render_in_commandlist=true
... env=${env}
... secret_file__kubeconfig=${kubeconfig}
${parsed_exceptions}= RW.K8sApplications.Parse Exceptions ${logs.stdout}
${parsed_exceptions}= RW.K8sApplications.Parse Stacktraces ${logs.stdout}
${count}= Evaluate len($parsed_exceptions)
RW.Core.Push Metric ${count}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
apiVersion: runwhen.com/v1
kind: GenerationRules
spec:
generationRules:
- resourceTypes:
- deployment
matchRules:
- type: and
matches:
- type: pattern
pattern: "kubectl.kubernetes.io//default-container"
properties: [annotations]
mode: substring
- type: pattern
pattern: "app"
properties: [labels]
mode: substring
- type: pattern
pattern: "codecollection.runwhen.com//app"
properties: [annotations]
mode: substring
slxs:
- baseName: k8s-tail-logs-dynamic
qualifiers: ["resource", "namespace", "cluster"]
baseTemplateName: k8s-tail-logs-dynamic
levelOfDetail: detailed
outputItems:
- type: slx
- type: slo
- type: runbook
templateName: k8s-tail-logs-dynamic-taskset.yaml
- type: sli
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
apiVersion: runwhen.com/v1
kind: ServiceLevelIndicator
metadata:
name: {{slx_name}}
labels:
{% include "common-labels.yaml" %}
annotations:
{% include "common-annotations.yaml" %}
spec:
displayUnitsLong: OK
displayUnitsShort: ok
locations:
- {{default_location}}
description: Measures the health of a application workload by parsing for stacktraces in its logs.
codeBundle:
{% if repo_url %}
repoUrl: {{repo_url}}
{% else %}
repoUrl: https://github.com/runwhen-contrib/rw-cli-codecollection.git
{% endif %}
{% if ref %}
ref: {{ref}}
{% else %}
ref: main
{% endif %}
pathToRobot: codebundles/k8s-tail-logs-dynamic/sli.robot
intervalStrategy: intermezzo
intervalSeconds: 60
configProvided:
- name: KUBERNETES_DISTRIBUTION_BINARY
value: kubectl
- name: LOGS_SINCE
value: 10m
- name: LABELS
value: app={{match_resource.resource.metadata.labels.app}}
- name: EXCLUDE_PATTERN
value: INFO
- name: CONTAINER_NAME
value: {{match_resource.resource.metadata.annotations.get('kubectl.kubernetes.io/default-container')}}
- name: MAX_LOG_LINES
value: '500'
- name: NAMESPACE
value: {{match_resource.resource.metadata.namespace}}
- name: CONTEXT
value: {{context}}
- name: STACKTRACE_PARSER
value: Dynamic
- name: INPUT_MODE
value: SPLIT
- name: MAX_LOG_BYTES
value: '2560000'
secretsProvided:
- name: kubeconfig
workspaceKey: {{custom.kubeconfig_secret_name}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
apiVersion: runwhen.com/v1
kind: ServiceLevelObjective
metadata:
name: {{slx_name}}
labels:
{% include "common-labels.yaml" %}
annotations:
{% include "common-annotations.yaml" %}
spec:
codeBundle:
repoUrl: https://github.com/runwhen-contrib/rw-public-codecollection.git
pathToYaml: codebundles/slo-default/queries.yaml
ref: main
sloSpecType: simple-mwmb
objective: 99
threshold: 1
operand: eq
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
apiVersion: runwhen.com/v1
kind: ServiceLevelX
metadata:
name: {{slx_name}}
labels:
{% include "common-labels.yaml" %}
annotations:
{% include "common-annotations.yaml" %}
spec:
imageURL: https://storage.googleapis.com/runwhen-nonprod-shared-images/icons/gcp/cloud_load_balancing/cloud_load_balancing.svg
alias: Tail {{match_resource.resource.metadata.name}} Application Logs For Stacktraces
asMeasuredBy: The number of stacktraces present in the application logs.
configProvided:
- name: OBJECT_NAME
value: {{match_resource.resource.metadata.name}}
owners:
- {{workspace.owner_email}}
statement: The application should not be throwing exceptions.
additionalContext:
namespace: "{{match_resource.resource.metadata.namespace}}"
labelMap: "{{match_resource.resource.metadata.labels}}"
cluster: "{{ cluster.name }}"
context: "{{ cluster.context }}"
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
apiVersion: runwhen.com/v1
kind: Runbook
metadata:
name: {{slx_name}}
labels:
{% include "common-labels.yaml" %}
annotations:
{% include "common-annotations.yaml" %}
spec:
location: {{default_location}}
codeBundle:
{% if repo_url %}
repoUrl: {{repo_url}}
{% else %}
repoUrl: https://github.com/runwhen-contrib/rw-cli-codecollection.git
{% endif %}
{% if ref %}
ref: {{ref}}
{% else %}
ref: main
{% endif %}
pathToRobot: codebundles/k8s-tail-logs-dynamic/runbook.robot
configProvided:
- name: KUBERNETES_DISTRIBUTION_BINARY
value: kubectl
- name: LOGS_SINCE
value: 10m
- name: LABELS
value: app={{match_resource.resource.metadata.labels.app}}
- name: EXCLUDE_PATTERN
value: INFO
- name: CONTAINER_NAME
value: {{match_resource.resource.metadata.annotations.get('kubectl.kubernetes.io/default-container')}}
- name: MAX_LOG_LINES
value: '500'
- name: NAMESPACE
value: {{match_resource.resource.metadata.namespace}}
- name: CONTEXT
value: {{context}}
- name: STACKTRACE_PARSER
value: Dynamic
- name: INPUT_MODE
value: SPLIT
- name: MAX_LOG_BYTES
value: '2560000'
secretsProvided:
- name: kubeconfig
workspaceKey: {{custom.kubeconfig_secret_name}}
38 changes: 38 additions & 0 deletions codebundles/k8s-tail-logs-dynamic/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Kubernetes Tail Application Logs For Stacktraces

This codebundle measures stack traces as they appear in your application logs and can produce reports for a breakdown of stack traces.
In order for it to appear in your workspace, just add the following annotations to your application deployments:
`codecollection.runwhen.com/app` and `annotations.kubectl.kubernetes.io/default-container` with the value being the name of the container in the deployment to search for stacktraces.

## Configuration
The TaskSet requires initialization to import necessary secrets, services, and user variables. The following variables should be set:

- `kubeconfig`: The kubeconfig secret containing access info for the cluster.
- `kubectl`: The location service used to interpret shell commands. Default value is `kubectl-service.shared`.
- `KUBERNETES_DISTRIBUTION_BINARY`: Which binary to use for Kubernetes CLI commands. Default value is `kubectl`.
- `CONTEXT`: The Kubernetes context to operate within.
- `NAMESPACE`: The name of the namespace to search. Leave it blank to search in all namespaces.
- `LABELS`: The labaels used for resource selection, particularly for fetching logs.
- `LOGS_SINCE`: How far back to scan for logs, eg: 20m, 3h
- `EXCLUDE_PATTERN`: a extended grep pattern used to filter out log results, such as exceptions/errors that you don't care about.
- `CONTAINER_NAME`: the name of the container within the labeled workload to fetch logs from.
- `MAX_LOG_LINES`: The maximum number of logs to fetch. Setting this too high can effect performance.
- `STACKTRACE_PARSER`: What parser to use on log lines. If left as Dynamic then the first one to return a result will be used for the rest of the logs to parse.
- `INPUT_MODE`: Determines how logs are fed into the parser. Typically the default should work.
- `MAX_LOG_BYTES`: Maximum number of bytes to fetch for logs from containers.

## Requirements
- A kubeconfig with appropriate RBAC permissions to fetch logs.

## Automated Building
Additionally you must have the following manifest changes in order for workspace builder to automatically setup this codebundle for you:

- A deployment with the follow annotations and labels:
- annotations.codecollection.runwhen.com/app: this annotation acts as a opt-in flag
- annotations.kubectl.kubernetes.io/default-container: the name of the container in the pod to search for stacktraces
- labels.app: selector used to grab logs from pods across a deployment

## TODO
- [ ] Add additional documentation.
- [ ] Finish suggestions error msg lookup

Loading

0 comments on commit d921ef1

Please sign in to comment.