Codebundle index update. (#360)

Co-authored-by: stewartshea@users.noreply.github.com <stewartshea>
runwhen-contrib · May 5, 2024 · 92a84bc · 92a84bc
1 parent 7d30f67
commit 92a84bc
Show file tree

Hide file tree

Showing 4 changed files with 153 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
-Troubleshooting Tasks in Codecollection: **145**
-Codebundles in Codecollection: **51**
+Troubleshooting Tasks in Codecollection: **148**
+Codebundles in Codecollection: **53**
 
 ![](docs/GitHub_Banner.jpg)
 
@@ -65,6 +65,7 @@ Run the codebundle
 | [Kubernetes ArgoCD Application Health & Troubleshoot](https://github.com/runwhen-contrib/rw-cli-codecollection/blob/main/codebundles/k8s-argocd-application-health/runbook.robot) | `Kubernetes`, `AKS`, `EKS`, `GKE`, `OpenShift`, `ArgoCD` | `Fetch ArgoCD Application Sync Status & Health for `${APPLICATION}``, `Fetch ArgoCD Application Last Sync Operation Details for `${APPLICATION}``, `Fetch Unhealthy ArgoCD Application Resources for `${APPLICATION}``, `Scan For Errors in Pod Logs Related to ArgoCD Application `${APPLICATION}``, `Fully Describe ArgoCD Application `${APPLICATION}`` | This taskset collects information and runs general troubleshooting checks against argocd application objects within a namespace. [Docs](https://docs.runwhen.com/public/v/cli-codecollection/k8s-argocd-application-health) |
 | [Kubernetes ArgoCD HelmRelease TaskSet](https://github.com/runwhen-contrib/rw-cli-codecollection/blob/main/codebundles/k8s-argocd-helm-health/runbook.robot) | `Kubernetes`, `AKS`, `EKS`, `GKE`, `OpenShift`, `ArgoCD` | `Fetch all available ArgoCD Helm releases in namespace `${NAMESPACE}``, `Fetch Installed ArgoCD Helm release versions in namespace `${NAMESPACE}`` | This codebundle runs a series of tasks to identify potential helm release issues related to ArgoCD managed Helm objects. [Docs](https://docs.runwhen.com/public/v/cli-codecollection/k8s-argocd-helm-health) |
 | [Kubernetes Artifactory Triage](https://github.com/runwhen-contrib/rw-cli-codecollection/blob/main/codebundles/k8s-artifactory-health/runbook.robot) | `Kubernetes`, `AKS`, `EKS`, `GKE`, `OpenShift`, `Artifactory` | `Check Artifactory Liveness and Readiness Endpoints` | Performs a triage on the Open Source version of Artifactory in a Kubernetes cluster. [Docs](https://docs.runwhen.com/public/v/cli-codecollection/k8s-artifactory-health) |
+| [Kubernetes Cluster Resource Health](https://github.com/runwhen-contrib/rw-cli-codecollection/blob/main/codebundles/k8s-cluster-resource-health/runbook.robot) | `Kubernetes`, `AKS`, `EKS`, `GKE`, `OpenShift` | `Identify High Utilization Nodes for Cluster `${CONTEXT}``, `Identify Pods Causing High Node Utilization in Cluster `${CONTEXT}`` | Identify resource constraints or issues in a cluster. [Docs](https://docs.runwhen.com/public/v/cli-codecollection/k8s-cluster-resource-health) |
 | [Kubernetes Daemonset Triage](https://github.com/runwhen-contrib/rw-cli-codecollection/blob/main/codebundles/k8s-daemonset-healthcheck/runbook.robot) | `Kubernetes`, `AKS`, `EKS`, `GKE`, `OpenShift` | `Get DaemonSet Log Details For Report`, `Get Related Daemonset Events`, `Check Daemonset Replicas` | Triages issues related to a Daemonset and its available replicas. [Docs](https://docs.runwhen.com/public/v/cli-codecollection/k8s-daemonset-healthcheck) |
 | [Kubernetes Deployment Triage](https://github.com/runwhen-contrib/rw-cli-codecollection/blob/main/codebundles/k8s-deployment-healthcheck/runbook.robot) | `Kubernetes`, `AKS`, `EKS`, `GKE`, `OpenShift` | `Check Deployment Log For Issues with `${DEPLOYMENT_NAME}``, `Check Liveness Probe Configuration for Deployment `${DEPLOYMENT_NAME}``, `Check Readiness Probe Configuration for Deployment `${DEPLOYMENT_NAME}``, `Troubleshoot Deployment Warning Events for `${DEPLOYMENT_NAME}``, `Get Deployment Workload Details For `${DEPLOYMENT_NAME}` and Add to Report`, `Troubleshoot Deployment Replicas for `${DEPLOYMENT_NAME}``, `Check Deployment Event Anomalies for `${DEPLOYMENT_NAME}``, `Check ReplicaSet Health for Deployment `${DEPLOYMENT_NAME}`` | Triages issues related to a deployment and its replicas. [Docs](https://docs.runwhen.com/public/v/cli-codecollection/k8s-deployment-healthcheck) |
 | [Kubernetes Flux Choas Testing](https://github.com/runwhen-contrib/rw-cli-codecollection/blob/main/codebundles/k8s-chaos-flux/runbook.robot) | `Kubernetes`, `AKS`, `EKS`, `GKE`, `OpenShift` | `Suspend the Flux Resource Reconciliation`, `Find Random FluxCD Workload as Chaos Target`, `Execute Chaos Command`, `Execute Additional Chaos Command`, `Resume Flux Resource Reconciliation` | This taskset is used to suspend a flux resource for the purposes of executing chaos tasks. [Docs](https://docs.runwhen.com/public/v/cli-codecollection/k8s-chaos-flux) |

diff --git a/SUMMARY.md b/SUMMARY.md
@@ -10,6 +10,7 @@
 * [k8s-vault-healthcheck](codebundles/k8s-vault-healthcheck/README.md)
 * [k8s-ingress-gce-healthcheck](codebundles/k8s-ingress-gce-healthcheck/README.md)
 * [k8s-statefulset-healthcheck](codebundles/k8s-statefulset-healthcheck/README.md)
+* [k8s-cluster-resource-health](codebundles/k8s-cluster-resource-health/README.md)
 * [k8s-ingress-healthcheck](codebundles/k8s-ingress-healthcheck/README.md)
 * [azure-loadbalancer-triage](codebundles/azure-loadbalancer-triage/README.md)
 * [k8s-pvc-healthcheck](codebundles/k8s-pvc-healthcheck/README.md)

diff --git a/codebundles/k8s-cluster-resource-health/meta.yaml b/codebundles/k8s-cluster-resource-health/meta.yaml
@@ -0,0 +1,149 @@
+commands:
+- command: bash 'get_high_use_nodes.sh'
+  doc_links: '
+
+    - [Analyzing resource usage in Kubernetes](https://kubernetes.io/docs/tasks/debug-application-cluster/resource-usage-monitoring/){:target="_blank"}'
+  explanation: This script is a bash script used to gather and analyze resource allocation
+    and usage data for nodes in a Kubernetes cluster. It retrieves information about
+    node details, allocatable resources, and usage, and then processes and analyzes
+    the data to identify nodes with high CPU and memory utilization, outputting the
+    results to a JSON file called high_use_nodes.json.
+  multi_line_details: "\n#!/bin/bash\n\n# Define Kubernetes binary and context with\
+    \ dynamic defaults\nKUBERNETES_DISTRIBUTION_BINARY=\"${KUBERNETES_DISTRIBUTION_BINARY:-kubectl}\"\
+    \ # Default to 'kubectl' if not set in the environment\nDEFAULT_CONTEXT=$(${KUBERNETES_DISTRIBUTION_BINARY}\
+    \ config current-context)\nCONTEXT=\"${CONTEXT:-$DEFAULT_CONTEXT}\" # Use environment\
+    \ variable or the current context from kubectl\n\n# Function to process nodes\
+    \ and their resource usage\nprocess_nodes_and_usage() {\n    # Get Node Details\
+    \ including allocatable resources\n    nodes=$(${KUBERNETES_DISTRIBUTION_BINARY}\
+    \ get nodes --context ${CONTEXT} -o json | jq '[.items[] | {\n        name: .metadata.name,\n\
+    \        cpu_allocatable: (.status.allocatable.cpu | rtrimstr(\"m\") | tonumber),\n\
+    \        memory_allocatable: (.status.allocatable.memory | gsub(\"Ki\"; \"\")\
+    \ | tonumber / 1024)\n    }]')\n\n    # Fetch node usage details\n    usage=$(${KUBERNETES_DISTRIBUTION_BINARY}\
+    \ top nodes --context ${CONTEXT} | awk 'BEGIN { printf \"[\" } NR>1 { printf \"\
+    %s{\\\"name\\\":\\\"%s\\\",\\\"cpu_usage\\\":\\\"%s\\\",\\\"memory_usage\\\":\\\
+    \"%s\\\"}\", (NR>2 ? \",\" : \"\"), $1, ($2 == \"<unknown>\" ? \"0\" : $2), ($4\
+    \ == \"<unknown>\" ? \"0\" : $4) } END { printf \"]\" }' | jq '.')\n\n    # Combine\
+    \ and process the data\n    jq -n --argjson nodes \"$nodes\" --argjson usage \"\
+    $usage\" '{\n        nodes: $nodes | map({name: .name, cpu_allocatable: .cpu_allocatable,\
+    \ memory_allocatable: .memory_allocatable}),\n        usage: $usage | map({name:\
+    \ .name, cpu_usage: (.cpu_usage | rtrimstr(\"m\") | tonumber // 0), memory_usage:\
+    \ (.memory_usage | rtrimstr(\"Mi\") | tonumber // 0)})\n    } | .nodes as $nodes\
+    \ | .usage as $usage | \n    $nodes | map(\n        . as $node | \n        $usage[]\
+    \ | \n        select(.name == $node.name) | \n        {\n            name: .name,\
+    \ \n            cpu_utilization_percentage: (.cpu_usage / $node.cpu_allocatable\
+    \ * 100),\n            memory_utilization_percentage: (.memory_usage / $node.memory_allocatable\
+    \ * 100)\n        }\n    ) | map(select(.cpu_utilization_percentage >= 90 or .memory_utilization_percentage\
+    \ >= 90))'\n}\n\n# Execute the function and save the output to a file\nprocess_nodes_and_usage\
+    \ > high_use_nodes.json\n\n# Output the contents of the generated file\ncat high_use_nodes.json\n"
+  name: identify_high_utilization_nodes_for_cluster_context
+  when_is_it_useful: '1. Identifying and troubleshooting performance issues in a Kubernetes
+    cluster, such as nodes with high CPU and memory utilization.
+
+
+    2. Optimizing resource allocation in a Kubernetes cluster to ensure efficient
+    use of resources and prevent overload on specific nodes.
+
+
+    3. Monitoring and maintaining the health and stability of a Kubernetes cluster
+    by identifying and addressing potential bottlenecks or capacity limitations.
+
+
+    4. Analyzing historical data on node usage to identify trends and patterns that
+    may indicate the need for scaling or rebalancing resources within the cluster.
+
+
+    5. Generating reports and insights on resource usage and allocation for stakeholders
+    and management to inform decision-making and resource planning within the Kubernetes
+    environment.'
+- command: bash 'pods_impacting_high_use_nodes.sh'
+  doc_links: '
+
+    - [CPU & Memory Allocations in Kubernetes](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/){:target="_blank"}
+
+    - [Normalizing Resource Metrics in Kubernetes](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#step-2-run-the-hpa){:target="_blank"}
+
+    - [Configuring Resource Requests in Kubernetes](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-requests-and-limits-of-pod-and-container/){:target="_blank"}'
+  explanation: This script is designed to automate the monitoring of resource requests
+    in a Kubernetes cluster. It involves fetching details around CPU & memory allocations
+    for nodes and pods, computes actual utilization, normalizes these metrics, and
+    then compares them against configured settings in order to identify any excessive
+    resource usage.
+  multi_line_details: "\n#!/bin/bash\n\n# Define Kubernetes binary and context with\
+    \ dynamic defaults\nKUBERNETES_DISTRIBUTION_BINARY=\"${KUBERNETES_DISTRIBUTION_BINARY:-kubectl}\"\
+    \ # Default to 'kubectl' if not set in the environment\nDEFAULT_CONTEXT=$(${KUBERNETES_DISTRIBUTION_BINARY}\
+    \ config current-context)\nCONTEXT=\"${CONTEXT:-$DEFAULT_CONTEXT}\" # Use environment\
+    \ variable or the current context from kubectl\n\nprocess_nodes_and_usage() {\n\
+    \    # Get Node Details including allocatable resources\n    nodes=$(${KUBERNETES_DISTRIBUTION_BINARY}\
+    \ get nodes --context ${CONTEXT} -o json | jq '[.items[] | {\n        name: .metadata.name,\n\
+    \        cpu_allocatable: (.status.allocatable.cpu | rtrimstr(\"m\") | tonumber),\n\
+    \        memory_allocatable: (.status.allocatable.memory | gsub(\"Ki\"; \"\")\
+    \ | tonumber / 1024)\n    }]')\n\n    # Fetch node usage details\n    usage=$(${KUBERNETES_DISTRIBUTION_BINARY}\
+    \ top nodes --context ${CONTEXT} | awk 'BEGIN { printf \"[\" } NR>1 { printf \"\
+    %s{\\\"name\\\":\\\"%s\\\",\\\"cpu_usage\\\":\\\"%s\\\",\\\"memory_usage\\\":\\\
+    \"%s\\\"}\", (NR>2 ? \",\" : \"\"), $1, ($2 == \"<unknown>\" ? \"0\" : $2), ($4\
+    \ == \"<unknown>\" ? \"0\" : $4) } END { printf \"]\" }' | jq '.')\n\n    # Combine\
+    \ and process the data\n    jq -n --argjson nodes \"$nodes\" --argjson usage \"\
+    $usage\" '{\n        nodes: $nodes | map({name: .name, cpu_allocatable: .cpu_allocatable,\
+    \ memory_allocatable: .memory_allocatable}),\n        usage: $usage | map({name:\
+    \ .name, cpu_usage: (.cpu_usage | rtrimstr(\"m\") | tonumber // 0), \n       \
+    \ memory_usage: (.memory_usage | rtrimstr(\"Mi\") | tonumber // 0)})\n    } |\
+    \ .nodes as $nodes | .usage as $usage | \n    $nodes | map(\n        . as $node\
+    \ | \n        $usage[] | \n        select(.name == $node.name) | \n        {\n\
+    \            name: .name, \n            cpu_utilization_percentage: (.cpu_usage\
+    \ / $node.cpu_allocatable * 100),\n            memory_utilization_percentage:\
+    \ (.memory_usage / $node.memory_allocatable * 100)\n        }\n    ) | map(select(.cpu_utilization_percentage\
+    \ >= 90 or .memory_utilization_percentage >= 90))'\n}\n\n\n# Fetch pod resource\
+    \ requests\n${KUBERNETES_DISTRIBUTION_BINARY} get pods --context ${CONTEXT} --all-namespaces\
+    \ -o json | jq -r '.items[] | {namespace: .metadata.namespace, \npod: .metadata.name,\
+    \ nodeName: .spec.nodeName, cpu_request: (.spec.containers[].resources.requests.cpu\
+    \ // \"0m\"), memory_request: (.spec.containers[].resources.requests.memory //\
+    \ \"0Mi\")} \n| select(.cpu_request != \"0m\" and .memory_request != \"0Mi\")'\
+    \ | jq -s '.' > pod_requests.json\n\n\n# Fetch current pod metrics\n${KUBERNETES_DISTRIBUTION_BINARY}\
+    \ top pods --context ${CONTEXT} --all-namespaces --containers | awk 'BEGIN { printf\
+    \ \"[\" } \nNR>1 { printf \"%s{\\\"namespace\\\":\\\"%s\\\",\\\"pod\\\":\\\"%s\\\
+    \",\\\"container\\\":\\\"%s\\\",\\\"cpu_usage\\\":\\\"%s\\\",\\\"memory_usage\\\
+    \":\\\"%s\\\"}\", (NR>2 ? \",\" : \"\"), $1, $2, $3, $4, $5 } \nEND { printf \"\
+    ]\" }' | jq '.' > pod_usage.json\n\n\n# Normalize units and compare\njq -s '[\n\
+    \    .[0][] as $usage | \n    .[1][] | \n    select(.pod == $usage.pod and .namespace\
+    \ == $usage.namespace) |\n    {\n        pod: .pod,\n        namespace: .namespace,\n\
+    \        node: .nodeName,\n        cpu_usage: $usage.cpu_usage,\n        cpu_request:\
+    \ .cpu_request,\n        cpu_usage_exceeds: (\n            # Convert CPU usage\
+    \ to millicores, assuming all inputs need to be converted from milli-units if\
+    \ they end with 'm'\n            ($usage.cpu_usage | \n                if test(\"\
+    m$\") then rtrimstr(\"m\") | tonumber \n                else tonumber * 1000 \n\
+    \                end\n            ) > (\n                # Convert CPU request\
+    \ to millicores, assuming it may already be in millicores if it ends with 'm'\n\
+    \                .cpu_request | \n                if test(\"m$\") then rtrimstr(\"\
+    m\") | tonumber \n                else tonumber * 1000 \n                end\n\
+    \            )\n        ),\n        memory_usage: $usage.memory_usage,\n     \
+    \   memory_request: .memory_request,\n        memory_usage_exceeds: (\n      \
+    \      # Normalize memory usage to MiB, handling MiB and GiB\n            ($usage.memory_usage\
+    \ | \n                if test(\"Gi$\") then rtrimstr(\"Gi\") | tonumber * 1024\n\
+    \                elif test(\"G$\") then rtrimstr(\"G\") | tonumber * 1024\n  \
+    \              elif test(\"Mi$\") then rtrimstr(\"Mi\") | tonumber\n         \
+    \       elif test(\"M$\") then rtrimstr(\"M\") | tonumber\n                else\
+    \ tonumber\n                end\n            ) > (\n                # Normalize\
+    \ memory request to MiB\n                .memory_request | \n                if\
+    \ test(\"Gi$\") then rtrimstr(\"Gi\") | tonumber * 1024\n                elif\
+    \ test(\"G$\") then rtrimstr(\"G\") | tonumber * 1024\n                elif test(\"\
+    Mi$\") then rtrimstr(\"Mi\") | tonumber\n                elif test(\"M$\") then\
+    \ rtrimstr(\"M\") | tonumber\n                else tonumber\n                end\n\
+    \            )\n        )\n    }\n    | select(.cpu_usage_exceeds or .memory_usage_exceeds)\n\
+    ] | group_by(.namespace) | map({(.[0].namespace): .}) | add' pod_usage.json pod_requests.json\
+    \ > pods_exceeding_requests.json\n\ncat pods_exceeding_requests.json\n"
+  name: identify_pods_causing_high_node_utilization_in_cluster_context
+  when_is_it_useful: '1. Identifying and troubleshooting performance issues in a Kubernetes
+    cluster, such as high CPU or memory usage, by monitoring resource requests and
+    utilization.
+
+    2. Automating the identification of pods or nodes causing CrashLoopBackoff events
+    in a Kubernetes cluster by comparing resource requests and actual utilization.
+
+    3. Implementing proactive resource monitoring and optimization strategies in a
+    Kubernetes environment to prevent potential outages or service disruptions.
+
+    4. Ensuring compliance with resource allocation policies and best practices within
+    a Kubernetes infrastructure by regularly monitoring resource utilization.
+
+    5. Streamlining the process of identifying and addressing resource-intensive workloads
+    within a Kubernetes cluster to optimize overall system performance.'
diff --git a/codebundles/k8s-loki-healthcheck/meta.yaml b/codebundles/k8s-loki-healthcheck/meta.yaml
@@ -48,8 +48,6 @@ commands:
 
     - [kubectl command](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands){:target="_blank"}
 
-    - [wget command](https://www.gnu.org/software/wget/manual/wget.html){:target="_blank"}
-
     - [JSON data retrieval](https://www.json.org/json-en.html){:target="_blank"}
 
     - [jq filtering](https://stedolan.github.io/jq/manual/){:target="_blank"}'