Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: tiered missing replicas alerts #61

Closed
wants to merge 1 commit into from
Closed

Conversation

a-june
Copy link
Contributor

@a-june a-june commented Nov 8, 2023

RFC: Using tiering in stock alerts.

This is a proposal and a request for comments on including tier information in stock alerts.

👍
All alerts are important, and the goal of the separation by tier is to allow us to provide better escalation policy for business hours and out of hours support depending on the impact.

👎
There is a drawback here with each alert Statefulset|DeploymentMissingXReplicas being replaced by 5 separate (in reality) copies:
TierUnknown|1|2|3|4DeploymentMissingXReplicas alerts but that's what makes them useful downstream.

Notes

  • Unknown Tier alerts should be treated as Tier 1 alerts.
  • keep_firing_for - this is meant to mitigate flappy alerts in situations when deployment/sts comes up for a short period of time and fails quickly
  • proposition includes only Deployment and Statefulset to provide working examples currently only those have annotations mentioned here whitelisted. This can be easily extended to Daemonsets.

Tier summary

Tier 1
Mission critical service or repository. Failure could result in significant impact to revenue or reputation.

Tier 2
Customer-facing service or repository. Failure results in degraded experience for customers, although without significant impact for revenue or reputation.

Tier 3
Internal service or repository. Failure could result in productivity being compromised within the company.

Tier 4
Other service or repository. Failure doesn't result in immediate or significant impact.

@a-june a-june closed this Nov 9, 2023
@a-june
Copy link
Contributor Author

a-june commented Nov 9, 2023

There is a drawback here with each alert Statefulset|DeploymentMissingXReplicas being replaced by 5 separate (in reality) copies:
TierUnknown|1|2|3|4DeploymentMissingXReplicas alerts but that's what makes them useful downstream.

Wrong approach - it doesn't need to be a different alert name when we can use labels provided in alert in Grafana itself.

New PR including information: #62

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant