Skip to content

Commit

Permalink
Fix/gcp node preempt (#302)
Browse files Browse the repository at this point in the history
* update nodepreempt

* update cb

* fix issue next steps

* remove next step
  • Loading branch information
stewartshea authored Jan 20, 2024
1 parent 746c863 commit 306006b
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 17 deletions.
21 changes: 14 additions & 7 deletions codebundles/gcloud-node-preempt/runbook.robot
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
*** Settings ***
Documentation List all GCP nodes that have an active preempt operation.
Documentation List all GCP nodes that have been preempted in the previous time interval.
Metadata Author stewartshea
Metadata Display Name GCP Node Prempt List
Metadata Supports GCP,GKE
Expand All @@ -15,21 +15,21 @@ Suite Setup Suite Initialization

*** Tasks ***
List all nodes in an active prempt operation for GCP Project `${GCP_PROJECT_ID}`
[Documentation] Fetches all nodes that have an active preempt operation at a global scope in the GCP Project
[Tags] stdout gcloud node preempt gcp ${GCP_PROJECT_ID}
[Documentation] Fetches all nodes that have been preempted within the defined time interval.
[Tags] stdout gcloud node preempt gcp ${gcp_project_id}
${preempt_node_list}= RW.CLI.Run Cli
... cmd=gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS && gcloud compute operations list --filter="operationType:(compute.instances.preempted) AND progress<100" --format=json --project=${GCP_PROJECT_ID} | jq '[.[] | {startTime,targetLink, statusMessage, progress, zone, selfLink}]'
... cmd=gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS && gcloud compute operations list --filter='operationType:(compute.instances.preempted)' --format=json --project=${GCP_PROJECT_ID} | jq -r --arg now "$(date -u +%s)" '[.[] | select((.startTime | sub("\\\\.[0-9]+"; "") | strptime("%Y-%m-%dT%H:%M:%S%z") | mktime) > ($now | tonumber - (${AGE}*60)))] '
... env=${env}
... secret_file__gcp_credentials_json=${gcp_credentials_json}
... show_in_rwl_cheatsheet=true
... timeout_seconds=180
${no_requests_count}= RW.CLI.Parse Cli Json Output
... rsp=${preempt_node_list}
... extract_path_to_var__preempt_node_count=length(@)
... set_issue_title=Found nodes in an active preempt operation for Project `${GCP_PROJECT_ID}`
... set_severity_level=3
... set_issue_title=Found nodes that were preempted in the last ${AGE} minutes for Project `${GCP_PROJECT_ID}`
... set_severity_level=4
... preempt_node_count__raise_issue_if_gt=0
... set_issue_details=Preempt operations are active on GCP nodes in this project ${GCP_PROJECT_ID}. We found $preempt_node_count nodes in preempt. If services are degraded, modify the node pool or deployment replica configurations, otherwise grab a coffee or take a walk.
... set_issue_details=Preempt operations are active on GCP nodes in this project ${GCP_PROJECT_ID}. We found $preempt_node_count nodes that preempted in the last ${AGE} minutes. If services are degraded, modify the node pool or deployment replica configurations. The following events occured: ${preempt_node_list.stdout}
... assign_stdout_from_var=preempt_node_count
${history}= RW.CLI.Pop Shell History
RW.Core.Add Pre To Report Total nodes in a preempt operation: ${no_requests_count.stdout}
Expand All @@ -49,9 +49,16 @@ Suite Initialization
... description=The GCP Project ID to scope the API to.
... pattern=\w*
... example=myproject-ID
${AGE}= RW.Core.Import User Variable AGE
... type=string
... description=The age, in minutes, since the preempt event.
... pattern=\d+
... default=15
... example=15
${OS_PATH}= Get Environment Variable PATH
Set Suite Variable ${GCP_PROJECT_ID} ${GCP_PROJECT_ID}
Set Suite Variable ${gcp_credentials_json} ${gcp_credentials_json}
Set Suite Variable ${AGE} ${AGE}
Set Suite Variable
... ${env}
... {"CLOUDSDK_CORE_PROJECT":"${GCP_PROJECT_ID}","GOOGLE_APPLICATION_CREDENTIALS":"./${gcp_credentials_json.key}","PATH":"$PATH:${OS_PATH}"}
23 changes: 13 additions & 10 deletions codebundles/gcloud-node-preempt/sli.robot
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
*** Settings ***
Metadata Author stewartshea
Documentation Check if any GCP nodes have an active preempt operation.
Documentation Counts nodes that have been preempted within the defined time interval.
Metadata Display Name GCP Node Prempt List
Metadata Supports GCP,GKE
Suite Setup Suite Initialization
Expand All @@ -22,23 +22,26 @@ Suite Initialization
... description=The GCP Project ID to scope the API to.
... pattern=\w*
... example=myproject-ID
${AGE}= RW.Core.Import User Variable AGE
... type=string
... description=The age, in minutes, since the preempt event.
... pattern=\d+
... default=15
... example=15
${OS_PATH}= Get Environment Variable PATH
Set Suite Variable ${gcp_credentials_json} ${gcp_credentials_json}
Set Suite Variable ${GCP_PROJECT_ID} ${GCP_PROJECT_ID}
Set Suite Variable ${gcp_credentials_json} ${gcp_credentials_json}
Set Suite Variable ${AGE} ${AGE}
Set Suite Variable ${env} {"CLOUDSDK_CORE_PROJECT":"${GCP_PROJECT_ID}","GOOGLE_APPLICATION_CREDENTIALS":"./${gcp_credentials_json.key}", "PATH":"$PATH:${OS_PATH}"}


*** Tasks ***
Count the number of nodes in active prempt operation
[Documentation] Fetches all nodes that have an active preempt operation at a global scope in the GCP Project
[Documentation] Counts all nodes that have been preempted within the defined time interval.
[Tags] Stdout gcloud node preempt gcp
${preempt_node_list}= RW.CLI.Run Cli
... cmd=gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS && gcloud compute operations list --filter="operationType:(compute.instances.preempted) AND progress<100" --format=json --project=${GCP_PROJECT_ID}
... cmd=gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS && gcloud compute operations list --filter='operationType:(compute.instances.preempted)' --format=json --project=${GCP_PROJECT_ID} | jq -r --arg now "$(date -u +%s)" '[.[] | select((.startTime | sub("\\\\.[0-9]+"; "") | strptime("%Y-%m-%dT%H:%M:%S%z") | mktime) > ($now | tonumber - (${AGE}*60)))] | length'
... env=${env}
... secret_file__gcp_credentials_json=${gcp_credentials_json}
${no_requests_count}= RW.CLI.Parse Cli Json Output
... rsp=${preempt_node_list}
... extract_path_to_var__preempt_node_count=length(@)
... assign_stdout_from_var=preempt_node_count
... timeout_seconds=180
${metric}= Convert To Number ${no_requests_count.stdout}
${metric}= Convert To Number ${preempt_node_list.stdout}
RW.Core.Push Metric ${metric}

0 comments on commit 306006b

Please sign in to comment.