Refactor Stacktrace Parsing Codebundles (#396)

* parser modes * update regexs in base and python parsers, add tests * Golang file lines from stacktraces * update parsers * sample django variant * app cb title updates * add django json variant * fix none edgecase * fix keywords * update parsers and template formatting * runbook cleanup * fix log call * Update to use labels so all pods are queried * fix sli labels * Fix keyword * empty report and detail lookup * Expand data ingestion amount * update parser for golang json logging * Add golang json codebundle and parse keyword * Update parsers to reduce noise in reports * explicit mcst set * Fix nones in joins * fix cmd call * implement dynamic parse * Update dynamic sli * Adjust metadata for dynamic * syntax fix * syntax fix * debug * add helper return * 0 result fix * line str helper change * duplicate bespoke parser cb cleanup * genrules * template tweak * mcst sanity check * use knative annotation for genrule * addressing code review
runwhen-contrib · Jul 11, 2024 · d921ef1 · d921ef1
1 parent 315b4e0
commit d921ef1
Show file tree

Hide file tree

Showing 22 changed files with 1,272 additions and 101 deletions.
diff --git a/codebundles/k8s-app-troubleshoot/runbook.robot b/codebundles/k8s-app-troubleshoot/runbook.robot
@@ -41,7 +41,7 @@ Scan `${CONTAINER_NAME}` Application For Misconfigured Environment
     RW.Core.Add Pre To Report    Stdout:\n\n${script_run.stdout}
     RW.Core.Add Pre To Report    Commands Used: ${history}
 
-Troubleshoot `${CONTAINER_NAME}` Application Logs
+Tail `${CONTAINER_NAME}` Application Logs For Stacktraces 
     [Documentation]    Performs an inspection on container logs for exceptions/stacktraces, parsing them and attempts to find relevant source code information
     [Tags]
     ...    application
@@ -84,7 +84,7 @@ Troubleshoot `${CONTAINER_NAME}` Application Logs
     # ${test_data}=    RW.K8sApplications.Get Test Data
     ${proc_list}=    RW.K8sApplications.Format Process List    ${proc_list.stdout}
     # ${serialized_env}=    RW.K8sApplications.Serialize env    ${printenv.stdout}
-    ${parsed_exceptions}=    RW.K8sApplications.Parse Exceptions    ${logs.stdout}
+    ${parsed_exceptions}=    RW.K8sApplications.Parse Stacktraces    ${logs.stdout}
     # ${parsed_exceptions}=    RW.K8sApplications.Parse Exceptions    ${test_data}
     ${repos}=    Create List    ${app_repo}
     ${ts_results}=    RW.K8sApplications.Troubleshoot Application
@@ -111,11 +111,11 @@ Troubleshoot `${CONTAINER_NAME}` Application Logs
     IF    (len($parsed_exceptions)) > 0
         RW.Core.Add Issue
         ...    severity=3
-        ...    expected=No exceptions were found in the application logs of ${CONTAINER_NAME}
-        ...    actual=Found exceptions in the application logs of ${CONTAINER_NAME}
+        ...    expected=No stacktraces were found in the application logs of ${CONTAINER_NAME}
+        ...    actual=Found stacktraces in the application logs of ${CONTAINER_NAME}
         ...    reproduce_hint=Run:\n${cmd}\n view logs results for exceptions.
-        ...    title=Application Exceptions detected in ${CONTAINER_NAME}
-        ...    details=This exception prompted the creation of a GitHub issue: ${most_common_exception}
+        ...    title=Application Stacktraces Detected In `${CONTAINER_NAME}`
+        ...    details=This stacktrace prompted the creation of a GitHub issue: ${most_common_exception}
         ...    next_steps=${nextsteps}
     END
 

diff --git a/codebundles/k8s-app-troubleshoot/sli.robot b/codebundles/k8s-app-troubleshoot/sli.robot
@@ -31,7 +31,7 @@ Measure Application Exceptions
     ...    render_in_commandlist=true
     ...    env=${env}
     ...    secret_file__kubeconfig=${kubeconfig}
-    ${parsed_exceptions}=    RW.K8sApplications.Parse Exceptions    ${logs.stdout}
+    ${parsed_exceptions}=    RW.K8sApplications.Parse Stacktraces    ${logs.stdout}
     ${count}=    Evaluate    len($parsed_exceptions)
     RW.Core.Push Metric    ${count}
 

diff --git a/codebundles/k8s-tail-logs-dynamic/.runwhen/generation-rules/k8s-tail-logs-dynamic.yaml b/codebundles/k8s-tail-logs-dynamic/.runwhen/generation-rules/k8s-tail-logs-dynamic.yaml
@@ -0,0 +1,32 @@
+apiVersion: runwhen.com/v1
+kind: GenerationRules
+spec:
+  generationRules:
+    - resourceTypes:
+        - deployment
+      matchRules:
+        - type: and
+          matches:
+            - type: pattern
+              pattern: "kubectl.kubernetes.io//default-container"
+              properties: [annotations]
+              mode: substring
+            - type: pattern
+              pattern: "app"
+              properties: [labels]
+              mode: substring
+            - type: pattern
+              pattern: "codecollection.runwhen.com//app"
+              properties: [annotations]
+              mode: substring
+      slxs:
+        - baseName: k8s-tail-logs-dynamic
+          qualifiers: ["resource", "namespace", "cluster"]
+          baseTemplateName: k8s-tail-logs-dynamic
+          levelOfDetail: detailed
+          outputItems:
+            - type: slx
+            - type: slo
+            - type: runbook
+              templateName: k8s-tail-logs-dynamic-taskset.yaml
+            - type: sli
diff --git a/codebundles/k8s-tail-logs-dynamic/.runwhen/templates/k8s-tail-logs-dynamic-sli.yaml b/codebundles/k8s-tail-logs-dynamic/.runwhen/templates/k8s-tail-logs-dynamic-sli.yaml
@@ -0,0 +1,54 @@
+apiVersion: runwhen.com/v1
+kind: ServiceLevelIndicator
+metadata:
+  name: {{slx_name}}
+  labels:
+    {% include "common-labels.yaml" %}
+  annotations:
+    {% include "common-annotations.yaml" %}
+spec:
+  displayUnitsLong: OK
+  displayUnitsShort: ok
+  locations:
+    - {{default_location}}
+  description: Measures the health of a application workload by parsing for stacktraces in its logs.
+  codeBundle:
+    {% if repo_url %}
+    repoUrl: {{repo_url}}
+    {% else %}
+    repoUrl: https://github.com/runwhen-contrib/rw-cli-codecollection.git
+    {% endif %}
+    {% if ref %}
+    ref: {{ref}}
+    {% else %}
+    ref: main
+    {% endif %}
+    pathToRobot: codebundles/k8s-tail-logs-dynamic/sli.robot
+  intervalStrategy: intermezzo
+  intervalSeconds: 60
+  configProvided:
+  - name: KUBERNETES_DISTRIBUTION_BINARY
+    value: kubectl
+  - name: LOGS_SINCE
+    value: 10m
+  - name: LABELS
+    value: app={{match_resource.resource.metadata.labels.app}}
+  - name: EXCLUDE_PATTERN
+    value: INFO
+  - name: CONTAINER_NAME
+    value: {{match_resource.resource.metadata.annotations.get('kubectl.kubernetes.io/default-container')}}
+  - name: MAX_LOG_LINES
+    value: '500'
+  - name: NAMESPACE
+    value: {{match_resource.resource.metadata.namespace}}
+  - name: CONTEXT
+    value: {{context}}
+  - name: STACKTRACE_PARSER
+    value: Dynamic
+  - name: INPUT_MODE
+    value: SPLIT
+  - name: MAX_LOG_BYTES
+    value: '2560000'
+  secretsProvided:
+    - name: kubeconfig
+      workspaceKey: {{custom.kubeconfig_secret_name}}
diff --git a/codebundles/k8s-tail-logs-dynamic/.runwhen/templates/k8s-tail-logs-dynamic-slo.yaml b/codebundles/k8s-tail-logs-dynamic/.runwhen/templates/k8s-tail-logs-dynamic-slo.yaml
@@ -0,0 +1,17 @@
+apiVersion: runwhen.com/v1
+kind: ServiceLevelObjective
+metadata:
+  name: {{slx_name}}
+  labels:
+    {% include "common-labels.yaml" %}
+  annotations:
+    {% include "common-annotations.yaml" %}
+spec:
+  codeBundle:
+    repoUrl: https://github.com/runwhen-contrib/rw-public-codecollection.git
+    pathToYaml: codebundles/slo-default/queries.yaml
+    ref: main
+  sloSpecType: simple-mwmb
+  objective: 99
+  threshold: 1
+  operand: eq
diff --git a/codebundles/k8s-tail-logs-dynamic/.runwhen/templates/k8s-tail-logs-dynamic-slx.yaml b/codebundles/k8s-tail-logs-dynamic/.runwhen/templates/k8s-tail-logs-dynamic-slx.yaml
@@ -0,0 +1,23 @@
+apiVersion: runwhen.com/v1
+kind: ServiceLevelX
+metadata:
+  name: {{slx_name}}
+  labels:
+    {% include "common-labels.yaml" %}
+  annotations:
+    {% include "common-annotations.yaml" %}
+spec:
+  imageURL: https://storage.googleapis.com/runwhen-nonprod-shared-images/icons/gcp/cloud_load_balancing/cloud_load_balancing.svg
+  alias: Tail {{match_resource.resource.metadata.name}} Application Logs For Stacktraces
+  asMeasuredBy: The number of stacktraces present in the application logs.
+  configProvided:
+  - name: OBJECT_NAME
+    value: {{match_resource.resource.metadata.name}}
+  owners:
+  - {{workspace.owner_email}}
+  statement: The application should not be throwing exceptions.
+  additionalContext:  
+    namespace: "{{match_resource.resource.metadata.namespace}}"
+    labelMap: "{{match_resource.resource.metadata.labels}}"  
+    cluster: "{{ cluster.name }}"
+    context: "{{ cluster.context }}"
diff --git a/codebundles/k8s-tail-logs-dynamic/.runwhen/templates/k8s-tail-logs-dynamic-taskset.yaml b/codebundles/k8s-tail-logs-dynamic/.runwhen/templates/k8s-tail-logs-dynamic-taskset.yaml
@@ -0,0 +1,48 @@
+apiVersion: runwhen.com/v1
+kind: Runbook
+metadata:
+  name: {{slx_name}}
+  labels:
+    {% include "common-labels.yaml" %}
+  annotations:
+    {% include "common-annotations.yaml" %}
+spec:
+  location: {{default_location}}
+  codeBundle:
+    {% if repo_url %}
+    repoUrl: {{repo_url}}
+    {% else %}
+    repoUrl: https://github.com/runwhen-contrib/rw-cli-codecollection.git
+    {% endif %}
+    {% if ref %}
+    ref: {{ref}}
+    {% else %}
+    ref: main
+    {% endif %}
+    pathToRobot: codebundles/k8s-tail-logs-dynamic/runbook.robot
+  configProvided:
+  - name: KUBERNETES_DISTRIBUTION_BINARY
+    value: kubectl
+  - name: LOGS_SINCE
+    value: 10m
+  - name: LABELS
+    value: app={{match_resource.resource.metadata.labels.app}}
+  - name: EXCLUDE_PATTERN
+    value: INFO
+  - name: CONTAINER_NAME
+    value: {{match_resource.resource.metadata.annotations.get('kubectl.kubernetes.io/default-container')}}
+  - name: MAX_LOG_LINES
+    value: '500'
+  - name: NAMESPACE
+    value: {{match_resource.resource.metadata.namespace}}
+  - name: CONTEXT
+    value: {{context}}
+  - name: STACKTRACE_PARSER
+    value: Dynamic
+  - name: INPUT_MODE
+    value: SPLIT
+  - name: MAX_LOG_BYTES
+    value: '2560000'
+  secretsProvided:
+    - name: kubeconfig
+      workspaceKey: {{custom.kubeconfig_secret_name}}
diff --git a/codebundles/k8s-tail-logs-dynamic/README.md b/codebundles/k8s-tail-logs-dynamic/README.md
@@ -0,0 +1,38 @@
+# Kubernetes Tail Application Logs For Stacktraces
+
+This codebundle measures stack traces as they appear in your application logs and can produce reports for a breakdown of stack traces.
+In order for it to appear in your workspace, just add the following annotations to your application deployments:
+`codecollection.runwhen.com/app` and `annotations.kubectl.kubernetes.io/default-container` with the value being the name of the container in the deployment to search for stacktraces.
+
+## Configuration
+The TaskSet requires initialization to import necessary secrets, services, and user variables. The following variables should be set:
+
+- `kubeconfig`: The kubeconfig secret containing access info for the cluster.
+- `kubectl`: The location service used to interpret shell commands. Default value is `kubectl-service.shared`.
+- `KUBERNETES_DISTRIBUTION_BINARY`: Which binary to use for Kubernetes CLI commands. Default value is `kubectl`.
+- `CONTEXT`: The Kubernetes context to operate within.
+- `NAMESPACE`: The name of the namespace to search. Leave it blank to search in all namespaces.
+- `LABELS`: The labaels used for resource selection, particularly for fetching logs.
+- `LOGS_SINCE`: How far back to scan for logs, eg: 20m, 3h
+- `EXCLUDE_PATTERN`: a extended grep pattern used to filter out log results, such as exceptions/errors that you don't care about.
+- `CONTAINER_NAME`: the name of the container within the labeled workload to fetch logs from.
+- `MAX_LOG_LINES`: The maximum number of logs to fetch. Setting this too high can effect performance.
+- `STACKTRACE_PARSER`: What parser to use on log lines. If left as Dynamic then the first one to return a result will be used for the rest of the logs to parse.
+- `INPUT_MODE`: Determines how logs are fed into the parser. Typically the default should work.
+- `MAX_LOG_BYTES`: Maximum number of bytes to fetch for logs from containers.
+
+## Requirements
+- A kubeconfig with appropriate RBAC permissions to fetch logs.
+
+## Automated Building
+Additionally you must have the following manifest changes in order for workspace builder to automatically setup this codebundle for you:
+
+- A deployment with the follow annotations and labels:
+    -   annotations.codecollection.runwhen.com/app: this annotation acts as a opt-in flag
+    -   annotations.kubectl.kubernetes.io/default-container: the name of the container in the pod to search for stacktraces
+    -   labels.app: selector used to grab logs from pods across a deployment
+
+## TODO
+- [ ] Add additional documentation.
+- [ ] Finish suggestions error msg lookup
+