Skip to content

Commit

Permalink
New indexer codebundles and tweaks (#293)
Browse files Browse the repository at this point in the history
* initial gcp index test

* scale back test templates

* remove commented items from template (unsupported)

* test with gcp compute instances

* test path based option

* play with qualifiers

* keep debugging path

* try just simple path

* gcp test

* test quotes

* back to status

* try node preempt

* clean up gcp rule

* add additional slx context

* test context update

* update gcp test rule

* update desc

* debug templates

* attempt runbook debug

* update command

* add slo back in

* update meta

* add az lb health codebundle back

* initial cloud function cb test

* minor format fixes

* add log fetching

* fix trailing url .

* add template data

* add templates back

* modify filter to negate ACTIVE state

* fix http latency issue

* update reproduce hint

* deployment log fix. cloud function gen1/2 support and log issue refactor

* fix message processing and add gen 1 status check in sli

* minor deployment log adjustment

* change additional filter to uppercase OR
  • Loading branch information
stewartshea authored Jan 12, 2024
1 parent 0c29f88 commit 05f37ac
Show file tree
Hide file tree
Showing 25 changed files with 698 additions and 297 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/generate-index.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ jobs:
python3 .github/scripts/index.py .github/scripts/index-config.yaml
git add README.md
# Run the index update script
# Run the metadata generation script
python3 .github/scripts/meta.py codebundles/
git add codebundles/**/meta.yaml
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
apiVersion: runwhen.com/v1
kind: GenerationRules
spec:
platform: azure
generationRules:
- resourceTypes:
- azure_network_load_balancers
matchRules:
- type: and
matches:
- type: pattern
pattern: ".+"
properties: [name]
mode: substring
- resourceType: variables
type: pattern
pattern: ".+"
properties: [custom/azure_credentials_secret_name]
mode: exact
slxs:
- baseName: az-lb-health
levelOfDetail: basic
qualifiers: [resource, resource_group]
baseTemplateName: az-lb-health
outputItems:
- type: slx
- type: runbook
templateName: az-lb-health-taskset.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: runwhen.com/v1
kind: ServiceLevelX
metadata:
name: {{slx_name}}
labels:
{% include "common-labels.yaml" %}
annotations:
{% include "common-annotations.yaml" %}
spec:
imageURL: https://storage.googleapis.com/runwhen-nonprod-shared-images/icons/azure/networking/10062-icon-service-Load-Balancers.svg
alias: {{match_resource.name}} Azure Load Balancer Health
asMeasuredBy: "Querying the Azure Load Balancer health for incidents or critical events."
configProvided:
- name: OBJECT_NAME
value: {{match_resource.name}}
owners:
- {{workspace.owner_email}}
statement: Ensure Azure Network Load Balancers are healthy.
additionalContext:
tags: "{{match_resource.tags}}"
qualified_name: "{{ match_resource.qualified_name }}"
resource_group: "{{ match_resource.resource_group.name }}"
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
## Needs tuning as soon as we can match on application resource
apiVersion: runwhen.com/v1
kind: Runbook
metadata:
name: {{slx_name}}
labels:
{% include "common-labels.yaml" %}
annotations:
{% include "common-annotations.yaml" %}
spec:
location: {{default_location}}
codeBundle:
{% if repo_url %}
repoUrl: {{repo_url}}
{% else %}
repoUrl: https://github.com/runwhen-contrib/rw-cli-codecollection.git
{% endif %}
{% if ref %}
ref: {{ref}}
{% else %}
ref: main
{% endif %}
pathToRobot: codebundles/azure-loadbalancer-triage/runbook.robot
configProvided:
- name: AZ_HISTORY_RANGE
value: '24'
- name: AZ_LB_NAME
value: {{match_resource.name}}
- name: AZ_LB_ID
value: {{match_resource.id}}
secretsProvided:
- name: azure_credentials
workspaceKey: {{custom.azure_credentials_secret_name}}
servicesProvided: []
2 changes: 1 addition & 1 deletion codebundles/curl-http-ok/runbook.robot
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Checking HTTP URL Is Available And Timely
... set_issue_title=Actual HTTP latency exceeded target latency for ${owner_kind.stdout} `${owner_name.stdout}`
... set_severity_level=4
... time_total__raise_issue_if_gt=${TARGET_LATENCY}
... set_issue_next_steps=Check ${owner_kind} Log for Issues with `${owner_name}`
... set_issue_next_steps=Check ${owner_kind.stdout} Log for Issues with `${owner_name.stdout}`
... set_issue_details=${URL} responded with a latency of $time_total. Check services, pods, load balanacers, and virtual machines for unexpected saturation.
... assign_stdout_from_var=time_total
${history}= RW.CLI.Pop Shell History
Expand Down
Original file line number Diff line number Diff line change
@@ -1,29 +1,20 @@
apiVersion: runwhen.com/v1
kind: GenerationRules
spec:
platform: gcp
generationRules:
- resourceTypes:
- cluster
- gcp_compute_instances
matchRules:
- type: and
matches:
- resourceType: variables
type: pattern
pattern: "gcp"
properties: [custom/cloud_provider]
mode: substring
## TODO
## See if we can match on cloud.google.com/gke-preemptible=true labels on nodes
- resourceType: variables
type: pattern
pattern: "true"
properties: [custom/gcp_preempt_nodes]
mode: exact
- type: pattern
pattern: ".+"
properties: [scheduling/preemptible]
mode: substring
slxs:
- baseName: node-preempt
qualifiers: ["resource", "cluster"]
qualifiers: ["project"]
baseTemplateName: gcloud-node-preempt
levelOfDetail: detailed
levelOfDetail: basic
outputItems:
- type: slx
- type: sli
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,7 @@ spec:
intervalSeconds: 300
configProvided:
- name: GCP_PROJECT_ID
value: {{custom.gcp_project_id}}
value: {{match_resource.resource.project_id}}
secretsProvided:
- name: gcp_credentials_json
workspaceKey: {{custom.gcp_ops_suite_sa}}
servicesProvided:
- name: gcloud
locationServiceName: gcloud-service.shared
workspaceKey: {{custom.gcp_ops_suite_sa}}
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,13 @@ metadata:
{% include "common-annotations.yaml" %}
spec:
imageURL: https://storage.googleapis.com/runwhen-nonprod-shared-images/icons/google-cloud.svg
alias: GCP Node Preempt Events
asMeasuredBy: The number of GCP nodes active in a preempt operation.
alias: GCP Node Preempt Events for Project {{match_resource.resource.project_id}}
asMeasuredBy: The number of GCP nodes active in a preempt operation in project {{match_resource.resource.project_id}}.
configProvided:
- name: SLX_PLACEHOLDER
value: SLX_PLACEHOLDER
owners:
- {{workspace.owner_email}}
statement: Nodes in an active preempt event should trigger RunWhen workflows.
additionalContext:
cluster: "{{ cluster.name }}"
context: "{{ cluster.context }}"
additionalContext:
project: "{{match_resource.resource.project_id}}"
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ metadata:
spec:
locations:
- {{default_location}}
description: Measures ____
description: Counts the total number of nodes undergoing a preempt event.
codeBundle:
{% if repo_url %}
repoUrl: {{repo_url}}
Expand All @@ -26,10 +26,7 @@ spec:
intervalSeconds: 300
configProvided:
- name: GCP_PROJECT_ID
value: {{custom.gcp_project_id}}
value: {{match_resource.resource.project_id}}
secretsProvided:
- name: gcp_credentials_json
workspaceKey: {{custom.gcp_ops_suite_sa}}
servicesProvided:
- name: gcloud
locationServiceName: gcloud-service.shared
workspaceKey: {{custom.gcp_ops_suite_sa}}
55 changes: 18 additions & 37 deletions codebundles/gcloud-node-preempt/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,43 +1,24 @@
commands:
- command: gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS
&& gcloud compute operations list --filter="operationType:( compute.instances.preempted
) AND NOT status:( DONE )" --format=json --project=${GCP_PROJECT_ID} | jq '[.[]
| {startTime,targetLink, statusMessage, progress, zone, selfLink}]'
&& gcloud compute operations list --filter="operationType:(compute.instances.preempted)
AND progress<100" --format=json --project=${GCP_PROJECT_ID} | jq '[.[] | {startTime,targetLink,
statusMessage, progress, zone, selfLink}]'
doc_links: '
- [Service accounts on GCP](https://cloud.google.com/iam/docs/service-accounts){:target="_blank"}
- [Google Cloud service accounts](https://cloud.google.com/iam/docs/creating-managing-service-accounts){:target="_blank"}
- [Preemptible VM instances on GCP](https://cloud.google.com/compute/docs/instances/preemptible){:target="_blank"}'
explanation: This command activates a service account for authentication and then
lists all preempted compute instances that are not yet completed in a Google Cloud
Platform project, displaying specific information about each instance in JSON
format. The jq filter is used to select and format the specific fields of interest.
multi_line_details: "# Activate the service account using the key file from the\
\ Google Application Credentials\ngcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS\
\ && \n\n# List all compute operations with specific filters and output in JSON\
\ format\ngcloud compute operations list --filter=\"operationType:( compute.instances.preempted\
\ ) AND NOT status:( DONE )\" --format=json --project=${GCP_PROJECT_ID} |\n\n\
# Use jq to filter and customize the output to display only certain fields\njq\
\ '[.[] | {startTime,targetLink, statusMessage, progress, zone, selfLink}]'"
name: list_all_nodes_in_an_active_prempt_operation
- command: gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS
&& gcloud compute operations list --filter="operationType:( compute.instances.preempted
) AND NOT status:( DONE )" --format=json --project=${GCP_PROJECT_ID} | jq '[.[]
| {startTime,targetLink, statusMessage, progress, zone, selfLink}]'
doc_links: '
- [Service accounts on GCP](https://cloud.google.com/iam/docs/service-accounts){:target="_blank"}
- [Preemptible VM instances in Google Cloud](https://cloud.google.com/preemptible-vms){:target="_blank"}
- [Preemptible VM instances on GCP](https://cloud.google.com/compute/docs/instances/preemptible){:target="_blank"}'
explanation: This command activates a service account for authentication and then
lists all preempted compute instances that are not yet completed in a Google Cloud
Platform project, displaying specific information about each instance in JSON
format. The jq filter is used to select and format the specific fields of interest.
multi_line_details: "# Activate the service account using the key file from the\
\ Google Application Credentials\ngcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS\
\ && \n\n# List all compute operations with specific filters and output in JSON\
\ format\ngcloud compute operations list --filter=\"operationType:( compute.instances.preempted\
\ ) AND NOT status:( DONE )\" --format=json --project=${GCP_PROJECT_ID} |\n\n\
# Use jq to filter and customize the output to display only certain fields\njq\
\ '[.[] | {startTime,targetLink, statusMessage, progress, zone, selfLink}]'"
name: list_all_nodes_in_an_active_prempt_operation
- [jq documentation](https://stedolan.github.io/jq/manual/){:target="_blank"}'
explanation: This command activates a service account for Google Cloud and then
lists all preempted compute instances in a specific project, outputting the results
in JSON format with specific attributes using jq.
multi_line_details: "\n# Authenticate using a service account key\ngcloud auth activate-service-account\
\ --key-file=$GOOGLE_APPLICATION_CREDENTIALS \n\n# List compute operations for\
\ preempted instances with less than 100% progress in JSON format\n# Filter by\
\ operationType and progress, and specify the project\ngcloud compute operations\
\ list --filter=\"operationType:(compute.instances.preempted) AND progress<100\"\
\ --format=json --project=${GCP_PROJECT_ID} \n\n# Pipe the output to 'jq' to format\
\ the JSON response with specific fields\n| jq '[.[] | {startTime,targetLink,\
\ statusMessage, progress, zone, selfLink}]'\n"
name: list_all_nodes_in_an_active_prempt_operation_for_gcp_project_gcp_project_id
14 changes: 3 additions & 11 deletions codebundles/gcloud-node-preempt/runbook.robot
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,14 @@ Suite Setup Suite Initialization


*** Tasks ***
List all nodes in an active prempt operation
List all nodes in an active prempt operation for GCP Project `${GCP_PROJECT_ID}`
[Documentation] Fetches all nodes that have an active preempt operation at a global scope in the GCP Project
[Tags] stdout gcloud node preempt gcp
[Tags] stdout gcloud node preempt gcp ${GCP_PROJECT_ID}
${preempt_node_list}= RW.CLI.Run Cli
... cmd=gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS && gcloud compute operations list --filter="operationType:( compute.instances.preempted ) AND NOT status:( DONE )" --format=json --project=${GCP_PROJECT_ID} | jq '[.[] | {startTime,targetLink, statusMessage, progress, zone, selfLink}]'
... cmd=gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS && gcloud compute operations list --filter="operationType:(compute.instances.preempted) AND progress<100" --format=json --project=${GCP_PROJECT_ID} | jq '[.[] | {startTime,targetLink, statusMessage, progress, zone, selfLink}]'
... env=${env}
... secret_file__gcp_credentials_json=${gcp_credentials_json}
... show_in_rwl_cheatsheet=true
... render_in_commandlist=true
${no_requests_count}= RW.CLI.Parse Cli Json Output
... rsp=${preempt_node_list}
... extract_path_to_var__preempt_node_count=length(@)
Expand All @@ -39,12 +38,6 @@ List all nodes in an active prempt operation

*** Keywords ***
Suite Initialization
${GCLOUD_SERVICE}= RW.Core.Import Service gcloud
... type=string
... description=The selected RunWhen Service to use for accessing services within a network.
... pattern=\w*
... example=gcloud-service.shared
... default=gcloud-service.shared
${gcp_credentials_json}= RW.Core.Import Secret gcp_credentials_json
... type=string
... description=GCP service account json used to authenticate with GCP APIs.
Expand All @@ -57,7 +50,6 @@ Suite Initialization
... example=myproject-ID
${OS_PATH}= Get Environment Variable PATH
Set Suite Variable ${GCP_PROJECT_ID} ${GCP_PROJECT_ID}
Set Suite Variable ${GCLOUD_SERVICE} ${GCLOUD_SERVICE}
Set Suite Variable ${gcp_credentials_json} ${gcp_credentials_json}
Set Suite Variable
... ${env}
Expand Down
10 changes: 1 addition & 9 deletions codebundles/gcloud-node-preempt/sli.robot
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,6 @@ Library OperatingSystem

*** Keywords ***
Suite Initialization
${GCLOUD_SERVICE}= RW.Core.Import Service gcloud
... type=string
... description=The selected RunWhen Service to use for accessing services within a network.
... pattern=\w*
... example=gcloud-service.shared
... default=gcloud-service.shared
${gcp_credentials_json}= RW.Core.Import Secret gcp_credentials_json
... type=string
... description=GCP service account json used to authenticate with GCP APIs.
Expand All @@ -28,7 +22,6 @@ Suite Initialization
... description=The GCP Project ID to scope the API to.
... pattern=\w*
... example=myproject-ID
Set Suite Variable ${GCLOUD_SERVICE} ${GCLOUD_SERVICE}
Set Suite Variable ${gcp_credentials_json} ${gcp_credentials_json}
Set Suite Variable ${env} {"CLOUDSDK_CORE_PROJECT":"${GCP_PROJECT_ID}","GOOGLE_APPLICATION_CREDENTIALS":"./${gcp_credentials_json.key}"}

Expand All @@ -38,8 +31,7 @@ Count the number of nodes in active prempt operation
[Documentation] Fetches all nodes that have an active preempt operation at a global scope in the GCP Project
[Tags] Stdout gcloud node preempt gcp
${preempt_node_list}= RW.CLI.Run Cli
... cmd=gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS && gcloud compute operations list --filter="operationType:( compute.instances.preempted ) AND NOT status:( DONE )" --format=json --project=${GCP_PROJECT_ID}
... target_service=${GCLOUD_SERVICE}
... cmd=gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS && gcloud compute operations list --filter="operationType:(compute.instances.preempted) AND progress<100" --format=json --project=${GCP_PROJECT_ID}
... env=${env}
... secret_file__gcp_credentials_json=${gcp_credentials_json}
${no_requests_count}= RW.CLI.Parse Cli Json Output
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
apiVersion: runwhen.com/v1
kind: GenerationRules
spec:
platform: gcp
generationRules:
- resourceTypes:
- gcp_functions_functions
matchRules:
- type: pattern
pattern: ".+"
properties: [name]
mode: substring
slxs:
- baseName: gcp-function-health
qualifiers: ["project"]
baseTemplateName: gcp-cloud-function-health
levelOfDetail: basic
outputItems:
- type: slx
- type: sli
- type: slo
- type: runbook
templateName: gcp-cloud-function-health-taskset.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
apiVersion: runwhen.com/v1
kind: ServiceLevelIndicator
metadata:
name: {{slx_name}}
labels:
{% include "common-labels.yaml" %}
annotations:
{% include "common-annotations.yaml" %}
spec:
displayUnitsLong: Number
displayUnitsShort: '#'
locations:
- {{default_location}}
description: Measures ____
codeBundle:
{% if repo_url %}
repoUrl: {{repo_url}}
{% else %}
repoUrl: https://github.com/runwhen-contrib/rw-cli-codecollection.git
{% endif %}
{% if ref %}
ref: {{ref}}
{% else %}
ref: main
{% endif %}
pathToRobot: codebundles/gcp-cloud-function-health/sli.robot
intervalStrategy: intermezzo
intervalSeconds: 300
configProvided:
- name: GCP_PROJECT_ID
value: {{match_resource.resource.project_id}}
secretsProvided:
- name: gcp_credentials_json
workspaceKey: {{custom.gcp_ops_suite_sa}}
Loading

0 comments on commit 05f37ac

Please sign in to comment.