diff --git a/docs/en/observability/apm-alerts.asciidoc b/docs/en/observability/apm-alerts.asciidoc deleted file mode 100644 index 34917de315..0000000000 --- a/docs/en/observability/apm-alerts.asciidoc +++ /dev/null @@ -1,170 +0,0 @@ -[[apm-alerts]] -= APM alerts and rules - -The Applications UI allows you to define **rules** to detect complex conditions within your APM data -and trigger built-in **actions** when those conditions are met. - -The following **rules** are supported: - -* **Threshold rule**: -Alert when the latency or failed transaction rate is abnormal. -Threshold rules can be as broad or as granular as you'd like, enabling you to define exactly when you want to be alerted--whether that's at the environment level, service name level, transaction type level, and/or transaction name level. -* **Anomaly rule**: -Alert when either the latency of a service is anomalous. Anomaly rules can be set at the environment level, service level, and/or transaction type level. -* **Error count rule**: -Alert when the number of errors in a service exceeds a defined threshold. Error count rules can be set at the environment level, service level, and error group level. - -[role="screenshot"] -image::./images/apm-alert.png[Create an alert in the Applications UI] - -Below, we'll walk through the creation of two APM rules. - -For a complete walkthrough of the **Create rule** flyout panel, including detailed information on each configurable property, -see Kibana's {kibana-ref}/create-and-manage-rules.html[Create and manage rules]. - -[float] -[[apm-create-transaction-alert]] -== Example: create a latency anomaly rule - -Latency anomaly rules trigger when the latency of a service is abnormal. -Because some parts of an application are more important than others, and have a different -tolerance for latency, we'll target a specific transaction within a service. - -Before continuing, identify the service name, transaction type, and environment that you'd like to create a latency anomaly rule for. -This guide will create an alert for all services based on the following criteria: - -* Service: `{your_service.name}` -* Transaction: `{your_transaction.name}` -* Environment: `{your_service.environment}` -* Severity level: critical -* Check every five minutes -* Send an alert to a Slack channel when the rule status changes - -From any page in the Applications UI, select **Alerts and rules** > **Create anomaly rule**. -Change the name of the rule, but do not edit the tags. - -Based on the criteria above, define the following rule details: - -* **Service** - `{your_service.name}` -* **Type** - `{your_transaction.name}` -* **Environment** - `{your_service.environment}` -* **Has anomaly with severity** - `critical` -* **Check every** - `5 minutes` - -Next, add a connector type. Multiple connectors can be selected, but in this example we're interested in Slack. -Select **Slack** > **Create a connector**. -Enter a name for the connector, -and paste your Slack webhook URL. -See Slack's webhook documentation if you need to create one. - -A default message is provided as a starting point for your alert. -You can use the https://mustache.github.io/[Mustache] template syntax, i.e., `{{variable}}` -to pass additional alert values at the time a condition is detected to an action. -A list of available variables can be accessed by selecting the -**add variable** button image:./images/add-variable.png[add variable button]. - -Click **Save**. Your rule has been created and is now active! - -[float] -[[apm-create-error-alert]] -== Example: create an error count threshold alert - -The error count threshold alert triggers when the number of errors in a service exceeds a defined threshold. -Because some errors are more important than others, this guide will focus a specific error group ID. - -Before continuing, identify the service name, environment name, and error group ID that you'd like to create a latency anomaly rule for. -The easiest way to find an error group ID is to select the service that you're interested in and navigating to the **Errors** tab. - -This guide will create an alert for an error group ID based on the following criteria: - -* Service: `{your_service.name}` -* Environment: `{your_service.environment}` -* Error Grouping Key: `{your_error.ID}` -* Error rate is above 25 errors for the last five minutes -* Group alerts by `service.name` and `service.environment` -* Check every 1 minute -* Send the alert via email to the site reliability team - -From any page in the Applications UI, select **Alerts and rules** > **Create error count rule**. -Change the name of the alert, but do not edit the tags. - -Based on the criteria above, define the following rule details: - -* **Service**: `{your_service.name}` -* **Environment**: `{your_service.environment}` -* **Error Grouping Key**: `{your_error.ID}` -* **Is above** - `25 errors` -* **For the last** - `5 minutes` -* **Group alerts by** - `service.name` `service.environment` -* **Check every** - `1 minute` - -[NOTE] -==== -Alternatively, you can use a KQL filter to limit the scope of the alert: - -. Toggle on *Use KQL Filter*. -. Add a filter, for example to achieve the same effect as the example above: -+ -[source,txt] ------- -service.name:"{your_service.name}" and service.environment:"{your_service.environment}" and error.grouping_key:"{your_error.ID}" ------- - -Using a KQL Filter to limit the scope is available for _Latency threshold_, _Failed transaction rate threshold_, and -_Error count threshold_ rules. -==== - -Select the **Email** connector and click **Create a connector**. -Fill out the required details: sender, host, port, etc., and click **save**. - -A default message is provided as a starting point for your alert. -You can use the https://mustache.github.io/[Mustache] template syntax, i.e., `{{variable}}` -to pass additional alert values at the time a condition is detected to an action. -A list of available variables can be accessed by selecting the -**add variable** button image:./images/add-variable.png[add variable button]. - -Click **Save**. The alert has been created and is now active! - -[float] -[[apm-alert-view-active]] -== View active alerts - -Active alerts are displayed and grouped in multiple ways in the Applications UI. - -[float] -[[apm-alert-view-group]] -=== View alerts by service group - -If you're using the <> feature, you can view alerts by service group. -From the service group overview page, click the red alert indicator to open the **Alerts** tab with a predefined filter that matches the filter used when creating the service group. - -[role="screenshot"] -image::./images/apm-service-group.png[Example view of service group in the Applications UI in Kibana] - -[float] -[[apm-alert-view-service]] -=== View alerts by service - -Alerts can be viewed within the context of any service. -After selecting a service, go to the **Alerts** tab to view any alerts that are active for the selected service. - -[role="screenshot"] -image::./images/active-alert-service.png[View active alerts by service] - -[float] -[[apm-alert-manage]] -== Manage alerts and rules - -From the Applications UI, select **Alerts and rules** > **Manage rules** to be taken to -the {kib} *{rules-ui}* page. -From this page, you can disable, mute, and delete APM alerts. - -[float] -[[apm-alert-more-info]] -== More information - -See {kibana-ref}/alerting-getting-started.html[Alerting] for more information. - -NOTE: If you are using an **on-premise** Elastic Stack deployment with security, -communication between Elasticsearch and Kibana must have TLS configured. -More information is in the alerting {kibana-ref}/alerting-setup.html#alerting-prerequisites[prerequisites]. diff --git a/docs/en/observability/apm-anomaly-rule.asciidoc b/docs/en/observability/apm-anomaly-rule.asciidoc new file mode 100644 index 0000000000..1ee98ca557 --- /dev/null +++ b/docs/en/observability/apm-anomaly-rule.asciidoc @@ -0,0 +1,91 @@ +[[apm-anomaly-rule]] += APM Anomaly rule + +APM Anomaly rules trigger when the latency, throughput, or failed transaction rate of a service is abnormal. + +[discrete] +[[apm-anomaly-rule-filters-conditions]] +== Filters and conditions + +Because some parts of an application may be more important than others, you might have a different tolerance +for abnormal performance across services in your application. You can filter the services in your application to +apply an APM Anomaly rule to specific services (`SERVICE`), transaction types (`TYPE`), and environments (`ENVIRONMENT`). + +Then, you can specify which conditions should result in an alert. This includes specifying: + +* The types of anomalies that are detected (`DETECTOR TYPES`): `latency`, `throughput`, and/or `failed transaction rate`. +* The severity level (`HAS ANOMALY WITH SEVERITY`): `critical`, `major`, `minor`, `warning`. + +.Example +**** +This example creates a rule for all production services that would result in an alert when a critical latency +anomaly is detected: + +image::apm-anomaly-rule-filters-conditions.png[width=600] +**** + +[discrete] +== Rule schedule + +include::../shared/alerting-and-rules/generic-apm-rule-schedule.asciidoc[] + +[discrete] +== Advanced options + +include::../shared/alerting-and-rules/generic-apm-advanced-options.asciidoc[] + +[discrete] +== Actions + +Extend your rules by connecting them to actions that use built-in integrations. + +[discrete] +=== Action types + +Supported built-in integrations include: + +include::../shared/alerting-and-rules/alerting-connectors.asciidoc[] + +[discrete] +=== Action frequency + +include::../shared/alerting-and-rules/generic-apm-action-frequency.asciidoc[] + +[discrete] +[[apm-anomaly-rule-action-variables]] +=== Action variables + +A default message is provided as a starting point for your alert. +If you want to customize the message, add more context to the message by clicking the icon above +the message text box and selecting from a list of available variables. + +TIP: To add variables to alert messages, use https://mustache.github.io/[Mustache] template syntax, for example `{{variable.name}}`. + +image::apm-anomaly-rule-action-variables.png[width=600] + +The following variables are specific to this rule type. +You an also specify {kibana-ref}/rule-action-variables.html[variables common to all rules]. + +`context.alertDetailsUrl`:: +Link to the alert troubleshooting view for further context and details. This will be an empty string if the server.publicBaseUrl is not configured. + +`context.environment`:: +The transaction type the alert is created for. + +`context.reason`:: +A concise description of the reason for the alert. + +`context.serviceName`:: +The service the alert is created for. + +`context.threshold`:: +Any trigger value above this value will cause the alert to fire. + +`context.transactionType`:: +The transaction type the alert is created for. + +`context.triggerValue`:: +The value that breached the threshold and triggered the alert. + +`context.viewInAppUrl`:: +Link to the alert source. \ No newline at end of file diff --git a/docs/en/observability/apm-error-count-threshold-rule.asciidoc b/docs/en/observability/apm-error-count-threshold-rule.asciidoc new file mode 100644 index 0000000000..69bd67716d --- /dev/null +++ b/docs/en/observability/apm-error-count-threshold-rule.asciidoc @@ -0,0 +1,121 @@ +[[apm-error-count-threshold-rule]] += Error count threshold rule + +Alert when the number of errors in a service exceeds a defined threshold. Error count rules can be set at the +environment level, service level, and error group level. + +[discrete] +[[apm-error-count-threshold-rule-filters-conditions]] +== Filters and conditions + +Filter the errors coming from your application to apply an Error count threshold rule to a specific +service (`SERVICE`), environment (`ENVIRONMENT`) or error grouping key (`ERROR GROUPING KEY`). +Alternatively, you can use a {kibana-ref}/kuery-query.html[KQL filter] to limit the scope of the alert +by toggling on the *Use KQL Filter* option. + +[TIP] +==== +Similar errors are grouped together to make it easy to quickly see which errors are affecting your services and to take actions to rectify them. Each group of errors has a unique _error grouping key_ — a hash of the stack trace and other properties. +==== + +Then, you can specify which conditions should result in an alert. This includes specifying: + +* The number of errors that occurred (`IS ABOVE`). +* The timeframe in which the errors must occur (`FOR THE LAST`) in seconds, minutes, hours, or days. + +.Example +**** +This example creates a rule for all production services that would result in an alert when there are 25 errors +in the last five minutes: + +image::apm-error-count-rule-filters-conditions.png[width=600] + +Alternatively, you can use a KQL filter to limit the scope of the alert: + +. Toggle on *Use KQL Filter*. +. Add a filter: ++ +[source,txt] +------ +service.environment:"Production" +------ +**** + +[discrete] +== Groups + +include::../shared/alerting-and-rules/generic-apm-group-by.asciidoc[] + +[discrete] +== Rule schedule + +include::../shared/alerting-and-rules/generic-apm-rule-schedule.asciidoc[] + +[discrete] +== Advanced options + +include::../shared/alerting-and-rules/generic-apm-advanced-options.asciidoc[] + +[discrete] +== Actions + +Extend your rules by connecting them to actions that use built-in integrations. + +[discrete] +=== Action types + +Supported built-in integrations include: + +include::../shared/alerting-and-rules/alerting-connectors.asciidoc[] + +[discrete] +=== Action frequency + +include::../shared/alerting-and-rules/generic-apm-action-frequency.asciidoc[] + +[discrete] +=== Action variables + +A default message is provided as a starting point for your alert. +If you want to customize the message, add more context to the message by clicking the icon above +the message text box and selecting from a list of available variables. + +TIP: To add variables to alert messages, use https://mustache.github.io/[Mustache] template syntax, for example `{{variable.name}}`. + +image::apm-error-count-rule-action-variables.png[width=600] + +The following variables are specific to this rule type. +You an also specify {kibana-ref}/rule-action-variables.html[variables common to all rules]. + +`context.alertDetailsUrl`:: +Link to the alert troubleshooting view for further context and details. This will be an empty string if the server.publicBaseUrl is not configured. + +`context.environment`:: +The transaction type the alert is created for + +`context.errorGroupingKey`:: +The error grouping key the alert is created for + +`context.errorGroupingName`:: +The error grouping name the alert is created for + +`context.interval`:: +The length and unit of the time period where the alert conditions were met + +`context.reason`:: +A concise description of the reason for the alert + +`context.serviceName`:: +The service the alert is created for + +`context.threshold`:: +Any trigger value above this value will cause the alert to fir + +`context.transactionName`:: +The transaction name the alert is created for + +`context.triggerValue`:: +The value that breached the threshold and triggered the alert + +`context.viewInAppUrl`:: +Link to the alert source diff --git a/docs/en/observability/apm-failed-transaction-rate-threshold-rule.asciidoc b/docs/en/observability/apm-failed-transaction-rate-threshold-rule.asciidoc new file mode 100644 index 0000000000..bfb50ad562 --- /dev/null +++ b/docs/en/observability/apm-failed-transaction-rate-threshold-rule.asciidoc @@ -0,0 +1,111 @@ +[[apm-failed-transaction-rate-threshold-rule]] += Failed transaction rate threshold rule + +Alert when the rate of transaction errors in a service exceeds a defined threshold. + +[discrete] +== Filters and conditions + +Filter the transactions coming from your application to apply an Failed transaction rate threshold rule to a specific +service (`SERVICE`), environment (`ENVIRONMENT`), transaction type (`TYPE`), or transaction name (`NAME`). +Alternatively, you can use a {kibana-ref}/kuery-query.html[KQL filter] to limit the scope of the alert +by toggling on the *Use KQL Filter* option. + +Then, you can specify which conditions should result in an alert. This includes specifying: + +* The percent of transactions that failed (`IS ABOVE`). +* The timeframe in which the failures must occur (`FOR THE LAST`) in seconds, minutes, hours, or days. + +.Example +**** +This example creates a rule for all production services that would result in an alert when at least 30% +of transactions failed in the last hour: + +image::apm-failed-transaction-rate-threshold-rule-filters-conditions.png[width=600] + +Alternatively, you can use a KQL filter to limit the scope of the alert: + +. Toggle on *Use KQL Filter*. +. Add a filter: ++ +[source,txt] +------ +service.environment:"Production" +------ +**** + +[discrete] +== Groups + +include::../shared/alerting-and-rules/generic-apm-group-by.asciidoc[] + +[discrete] +== Rule schedule + +include::../shared/alerting-and-rules/generic-apm-rule-schedule.asciidoc[] + +[discrete] +== Advanced options + +include::../shared/alerting-and-rules/generic-apm-advanced-options.asciidoc[] + +[discrete] +== Actions + +Extend your rules by connecting them to actions that use built-in integrations. + +[discrete] +=== Action types + +Extend your rules by connecting them to actions that use the following supported built-in integrations. + +include::../shared/alerting-and-rules/alerting-connectors.asciidoc[] + +[discrete] +=== Action frequency + +include::../shared/alerting-and-rules/generic-apm-action-frequency.asciidoc[] + +[discrete] +=== Action variables + +A default message is provided as a starting point for your alert. +If you want to customize the message, add more context to the message by clicking the icon above +the message text box and selecting from a list of available variables. + +TIP: To add variables to alert messages, use https://mustache.github.io/[Mustache] template syntax, for example `{{variable.name}}`. + +image::apm-failed-transaction-rate-threshold-rule-action-variables.png[width=600] + +The following variables are specific to this rule type. +You an also specify {kibana-ref}/rule-action-variables.html[variables common to all rules]. + +`context.alertDetailsUrl`:: +Link to the alert troubleshooting view for further context and details. This will be an empty string if the server.publicBaseUrl is not configured. + +`context.environment`:: +The transaction type the alert is created for + +`context.interval`:: +The length and unit of the time period where the alert conditions were met + +`context.reason`:: +A concise description of the reason for the alert + +`context.serviceName`:: +The service the alert is created for + +`context.threshold`:: +Any trigger value above this value will cause the alert to fire + +`context.transactionName`:: +The transaction name the alert is created for + +`context.transactionType`:: +The transaction type the alert is created for + +`context.triggerValue`:: +The value that breached the threshold and triggered the alert + +`context.viewInAppUrl`:: +Link to the alert source \ No newline at end of file diff --git a/docs/en/observability/apm-latency-threshold-rule.asciidoc b/docs/en/observability/apm-latency-threshold-rule.asciidoc new file mode 100644 index 0000000000..9fc7dddcda --- /dev/null +++ b/docs/en/observability/apm-latency-threshold-rule.asciidoc @@ -0,0 +1,113 @@ +[[apm-latency-threshold-rule]] += Latency threshold rule + +Alert when the latency or failed transaction rate is abnormal. +Threshold rules can be as broad or as granular as you'd like, enabling you to define exactly when you want to be alerted--whether that's at the environment level, service name level, transaction type level, and/or transaction name level. + +[discrete] +== Filter and conditions + +Filter the transactions coming from your application to apply an Latency threshold rule to specific +services (`SERVICE`), environments (`ENVIRONMENT`), transaction types (`TYPE`), or transaction names (`NAME`). +Alternatively, you can use a {kibana-ref}/kuery-query.html[KQL filter] to limit the scope of the alert +by toggling on the *Use KQL Filter* option. + +Then, you can specify which conditions should result in an alert. This includes specifying: + +* Which latency measurement to evaluate against (`WHEN`): average, 95th percentile, or 99th percentile. +* The minimum value of the chosen latency measurement (`IS ABOVE`) in milliseconds. +* The timeframe in which the failures must occur (`FOR THE LAST`) in seconds, minutes, hours, or days. + +.Example +**** +This example creates a rule for all `request` transactions coming from production that would result in +an alert when the average latency is above 1 second (1000ms) for the last 30 minutes: + +image::apm-latency-threshold-rule-filters-conditions.png[width=600] + +Alternatively, you can use a KQL filter to limit the scope of the alert: + +. Toggle on *Use KQL Filter*. +. Add a filter: ++ +[source,txt] +------ +service.environment:"Production" and transaction.type:"request" +------ +**** + +[discrete] +== Groups + +include::../shared/alerting-and-rules/generic-apm-group-by.asciidoc[] + +[discrete] +== Rule schedule + +include::../shared/alerting-and-rules/generic-apm-rule-schedule.asciidoc[] + +[discrete] +== Advanced options + +include::../shared/alerting-and-rules/generic-apm-advanced-options.asciidoc[] + +[discrete] +== Actions + +Extend your rules by connecting them to actions that use built-in integrations. + +[discrete] +=== Action types + +Extend your rules by connecting them to actions that use the following supported built-in integrations. + +include::../shared/alerting-and-rules/alerting-connectors.asciidoc[] + +[discrete] +=== Action frequency + +include::../shared/alerting-and-rules/generic-apm-action-frequency.asciidoc[] + +[discrete] +=== Action variables + +A default message is provided as a starting point for your alert. +If you want to customize the message, add more context to the message by clicking the icon above +the message text box and selecting from a list of available variables. + +TIP: To add variables to alert messages, use https://mustache.github.io/[Mustache] template syntax, for example `{{variable.name}}`. + +image::apm-latency-threshold-rule-action-variables.png[width=600] + +The following variables are specific to this rule type. +You an also specify {kibana-ref}/rule-action-variables.html[variables common to all rules]. + +`context.alertDetailsUrl`:: +Link to the alert troubleshooting view for further context and details. This will be an empty string if the server.publicBaseUrl is not configured. + +`context.environment`:: +The transaction type the alert is created for. + +`context.interval`:: +The length and unit of the time period where the alert conditions were met. + +`context.reason`:: +A concise description of the reason for the alert. + +`context.serviceName`:: +The service the alert is created for. + +`context.threshold`:: +Any trigger value above this value will cause the alert to fire. + +`context.transactionName`:: +The transaction name the alert is created for. + +`context.transactionType`:: +The transaction type the alert is created for. + +`context.triggerValue`:: +The value that breached the threshold and triggered the alert. + +`context.viewInAppUrl`:: +Link to the alert source. diff --git a/docs/en/observability/apm/act-on-data/alerts.asciidoc b/docs/en/observability/apm/act-on-data/alerts.asciidoc new file mode 100644 index 0000000000..3650ece61f --- /dev/null +++ b/docs/en/observability/apm/act-on-data/alerts.asciidoc @@ -0,0 +1,99 @@ +[[apm-alerts]] += Create APM rules and alerts + +++++ +Create rules and alerts +++++ + +The Applications UI allows you to define *rules* to detect complex conditions within your APM data +and trigger built-in *actions* when those conditions are met. + +[discrete] +== APM rules + +The following APM rules are supported: + +[cols="1,1"] +|=== +| *APM Anomaly* +a| Alert when either the latency, throughput, or failed transaction rate of a service is anomalous. +Anomaly rules can be set at the environment level, service level, and/or transaction type level. + +Read more in <> + +| *Error count threshold* +a| Alert when the number of errors in a service exceeds a defined threshold. Error count rules can be set at the +environment level, service level, and error group level. + +Read more in <> + +| *Failed transaction rate threshold* +a| Alert when the rate of transaction errors in a service exceeds a defined threshold. + +Read more in <> + +| *Latency threshold* +a| Alert when the latency or failed transaction rate is abnormal. +Threshold rules can be as broad or as granular as you'd like, enabling you to define exactly when you want to be alerted--whether that's at the environment level, service name level, transaction type level, and/or transaction name level. + +Read more in <> + +|=== + +// [role="screenshot"] +// image::./images/apm-alert.png[Create an alert in the Applications UI] + +[TIP] +==== +For a complete walkthrough of the **Create rule** flyout panel, including detailed information on each configurable property, +see Kibana's {kibana-ref}/create-and-manage-rules.html[Create and manage rules]. +==== + +[discrete] +== Rules and alerts in the Applications UI + +View and manage rules and alerts in the Applications UI. + +[float] +[[apm-alert-view-active]] +=== View active alerts + +Active alerts are displayed and grouped in multiple ways in the Applications UI. + +[float] +[[apm-alert-view-group]] +==== View alerts by service group + +If you're using the <> feature, you can view alerts by service group. +From the service group overview page, click the red alert indicator to open the **Alerts** tab with a predefined filter that matches the filter used when creating the service group. + +[role="screenshot"] +image::./images/apm-service-group.png[Example view of service group in the Applications UI in Kibana] + +[float] +[[apm-alert-view-service]] +==== View alerts by service + +Alerts can be viewed within the context of any service. +After selecting a service, go to the **Alerts** tab to view any alerts that are active for the selected service. + +[role="screenshot"] +image::./images/active-alert-service.png[View active alerts by service] + +[float] +[[apm-alert-manage]] +=== Manage alerts and rules + +From the Applications UI, select **Alerts and rules** → **Manage rules** to be taken to +the {kib} *{rules-ui}* page. +From this page, you can disable, mute, and delete APM alerts. + +[float] +[[apm-alert-more-info]] +=== More information + +See {kibana-ref}/alerting-getting-started.html[Alerting] for more information. + +NOTE: If you are using an **on-premise** Elastic Stack deployment with security, +communication between Elasticsearch and Kibana must have TLS configured. +More information is in the alerting {kibana-ref}/alerting-setup.html#alerting-prerequisites[prerequisites]. diff --git a/docs/en/observability/apm/act-on-data/index.asciidoc b/docs/en/observability/apm/act-on-data/index.asciidoc index 3803496d60..7cffd39553 100644 --- a/docs/en/observability/apm/act-on-data/index.asciidoc +++ b/docs/en/observability/apm/act-on-data/index.asciidoc @@ -8,17 +8,22 @@ In addition to exploring visualizations in the Applications UI in {kib}, you can make your application data more actionable with: -* *Alerts and rules*: The Applications UI allows you to define rules to detect complex +[cols="1,1"] +|=== +| <> +| The Applications UI allows you to define rules to detect complex conditions within your APM data and trigger built-in actions when those conditions are met. - Read more about alerts and rules in the <>. -* *Custom links*: Build URLs that contain relevant metadata from a specific trace. + +| <> +| Build URLs that contain relevant metadata from a specific trace. For example, you can create a link that will take you to a page where you can open a new GitHub issue with context already auto-populated in the issue body. These links will be shown in the _Actions_ context menu in selected areas of the Applications UI (for example, by transaction details). - Read more in <>. +|=== :leveloffset: +1 +include::{observability-docs-root}/docs/en/observability/apm/act-on-data/alerts.asciidoc[] include::{observability-docs-root}/docs/en/observability/apm/act-on-data/custom-links.asciidoc[] :!leveloffset: diff --git a/docs/en/observability/create-alerts.asciidoc b/docs/en/observability/create-alerts.asciidoc index 48da556e98..416cb9b5a7 100644 --- a/docs/en/observability/create-alerts.asciidoc +++ b/docs/en/observability/create-alerts.asciidoc @@ -61,10 +61,13 @@ tie into other third-party systems. Connectors allow actions to talk to these se Learn how to create specific types of rules: -* <> +* <> * <> -* <> +* <> +* <> * <> +* <> +* <> * <> * <> * <> @@ -162,14 +165,20 @@ xpack.observability.unsafe.alertingExperience.enabled: 'false' ---- -include::apm-alerts.asciidoc[leveloffset=+2] +include::apm-anomaly-rule.asciidoc[leveloffset=+2] include::threshold-alert.asciidoc[leveloffset=+2] -include::logs-threshold-alert.asciidoc[leveloffset=+2] +include::apm-error-count-threshold-rule.asciidoc[leveloffset=+2] + +include::apm-failed-transaction-rate-threshold-rule.asciidoc[leveloffset=+2] include::inventory-threshold-alert.asciidoc[leveloffset=+2] +include::apm-latency-threshold-rule.asciidoc[leveloffset=+2] + +include::logs-threshold-alert.asciidoc[leveloffset=+2] + include::metrics-threshold-alert.asciidoc[leveloffset=+2] include::monitor-status-alert.asciidoc[leveloffset=+2] diff --git a/docs/en/observability/images/apm-anomaly-rule-action-variables.png b/docs/en/observability/images/apm-anomaly-rule-action-variables.png new file mode 100644 index 0000000000..2d8bf26615 Binary files /dev/null and b/docs/en/observability/images/apm-anomaly-rule-action-variables.png differ diff --git a/docs/en/observability/images/apm-anomaly-rule-filters-conditions.png b/docs/en/observability/images/apm-anomaly-rule-filters-conditions.png new file mode 100644 index 0000000000..746733806d Binary files /dev/null and b/docs/en/observability/images/apm-anomaly-rule-filters-conditions.png differ diff --git a/docs/en/observability/images/apm-error-count-rule-action-variables.png b/docs/en/observability/images/apm-error-count-rule-action-variables.png new file mode 100644 index 0000000000..dab4c66bfe Binary files /dev/null and b/docs/en/observability/images/apm-error-count-rule-action-variables.png differ diff --git a/docs/en/observability/images/apm-error-count-rule-filters-conditions.png b/docs/en/observability/images/apm-error-count-rule-filters-conditions.png new file mode 100644 index 0000000000..5e2ee0b224 Binary files /dev/null and b/docs/en/observability/images/apm-error-count-rule-filters-conditions.png differ diff --git a/docs/en/observability/images/apm-failed-transaction-rate-threshold-rule-action-variables.png b/docs/en/observability/images/apm-failed-transaction-rate-threshold-rule-action-variables.png new file mode 100644 index 0000000000..f555485340 Binary files /dev/null and b/docs/en/observability/images/apm-failed-transaction-rate-threshold-rule-action-variables.png differ diff --git a/docs/en/observability/images/apm-failed-transaction-rate-threshold-rule-filters-conditions.png b/docs/en/observability/images/apm-failed-transaction-rate-threshold-rule-filters-conditions.png new file mode 100644 index 0000000000..f319120632 Binary files /dev/null and b/docs/en/observability/images/apm-failed-transaction-rate-threshold-rule-filters-conditions.png differ diff --git a/docs/en/observability/images/apm-latency-threshold-rule-action-variables.png b/docs/en/observability/images/apm-latency-threshold-rule-action-variables.png new file mode 100644 index 0000000000..fdef43438d Binary files /dev/null and b/docs/en/observability/images/apm-latency-threshold-rule-action-variables.png differ diff --git a/docs/en/observability/images/apm-latency-threshold-rule-filters-conditions.png b/docs/en/observability/images/apm-latency-threshold-rule-filters-conditions.png new file mode 100644 index 0000000000..81595fc657 Binary files /dev/null and b/docs/en/observability/images/apm-latency-threshold-rule-filters-conditions.png differ diff --git a/docs/en/observability/inventory-threshold-alert.asciidoc b/docs/en/observability/inventory-threshold-alert.asciidoc index 041921c5ec..4b9061dc39 100644 --- a/docs/en/observability/inventory-threshold-alert.asciidoc +++ b/docs/en/observability/inventory-threshold-alert.asciidoc @@ -47,7 +47,7 @@ image::images/inventory-alert.png[Inventory rule] Extend your rules by connecting them to actions that use the following supported built-in integrations. -include::../shared/alerting-connectors.asciidoc[] +include::../shared/alerting-and-rules/alerting-connectors.asciidoc[] After you select a connector, you must set the action frequency. You can choose to create a summary of alerts on each check interval or on a custom interval. For example, send email notifications that summarize the new, ongoing, and recovered alerts each hour: diff --git a/docs/en/observability/logs-threshold-alert.asciidoc b/docs/en/observability/logs-threshold-alert.asciidoc index 8a68bdb939..04a987d2ba 100644 --- a/docs/en/observability/logs-threshold-alert.asciidoc +++ b/docs/en/observability/logs-threshold-alert.asciidoc @@ -108,7 +108,7 @@ ratio. In this scenario, no alert is triggered. Extend your rules by connecting them to actions that use the following supported built-in integrations. -include::../shared/alerting-connectors.asciidoc[] +include::../shared/alerting-and-rules/alerting-connectors.asciidoc[] After you select a connector, you must set the action frequency. You can choose to create a summary of alerts on each check interval or on a custom interval. Alternatively, you can set the action frequency such that you choose how often the action runs (for example, at each check interval, only when the alert status changes, or at a custom action interval). In this case, you must also select the specific threshold condition that affects when actions run: `Fired` or `Recovered`. diff --git a/docs/en/observability/metrics-threshold-alert.asciidoc b/docs/en/observability/metrics-threshold-alert.asciidoc index 5c799985a9..626318340c 100644 --- a/docs/en/observability/metrics-threshold-alert.asciidoc +++ b/docs/en/observability/metrics-threshold-alert.asciidoc @@ -68,7 +68,7 @@ The default value is `1`. Extend your rules by connecting them to actions that use the following supported built-in integrations. -include::../shared/alerting-connectors.asciidoc[] +include::../shared/alerting-and-rules/alerting-connectors.asciidoc[] After you select a connector, you must set the action frequency. You can choose to create a summary of alerts on each check interval or on a custom interval. For example, send email notifications that summarize the new, ongoing, and recovered alerts each hour: diff --git a/docs/en/observability/monitor-status-alert.asciidoc b/docs/en/observability/monitor-status-alert.asciidoc index 826b9e0451..4ec6361d2a 100644 --- a/docs/en/observability/monitor-status-alert.asciidoc +++ b/docs/en/observability/monitor-status-alert.asciidoc @@ -75,7 +75,7 @@ image::images/synthetic-monitor-conditions.png[Filters and conditions defining a Extend your rules by connecting them to actions that use the following supported built-in integrations. -include::../shared/alerting-connectors.asciidoc[width=600] +include::../shared/alerting-and-rules/alerting-connectors.asciidoc[] After you select a connector, you must set the action frequency. You can choose to create a summary of alerts on each check interval or on a custom interval. diff --git a/docs/en/observability/slo-burn-rate-alert.asciidoc b/docs/en/observability/slo-burn-rate-alert.asciidoc index c553bc6030..711f990ce0 100644 --- a/docs/en/observability/slo-burn-rate-alert.asciidoc +++ b/docs/en/observability/slo-burn-rate-alert.asciidoc @@ -38,7 +38,7 @@ third-party systems that run as background tasks on the {kib} server when rule c You can configure action types on the <> page. -include::../shared/alerting-connectors.asciidoc[] +include::../shared/alerting-and-rules/alerting-connectors.asciidoc[] After you select a connector, you must set the action frequency. You can choose to create a *Summary of alerts* on each check interval or on a custom interval. For example, you can send email notifications that summarize the new, ongoing, and recovered alerts every twelve hours. diff --git a/docs/en/observability/threshold-alert.asciidoc b/docs/en/observability/threshold-alert.asciidoc index 371960e683..202e7a6341 100644 --- a/docs/en/observability/threshold-alert.asciidoc +++ b/docs/en/observability/threshold-alert.asciidoc @@ -155,7 +155,7 @@ For example when it's set to `Logs`, you must have the appropriate *{observabili Extend your rules by connecting them to actions that use the following supported built-in integrations. -include::../shared/alerting-connectors.asciidoc[] +include::../shared/alerting-and-rules/alerting-connectors.asciidoc[] After you select a connector, you must set the action frequency. You can choose to create a summary of alerts on each check interval or on a custom interval. Alternatively, you can set the action frequency such that you choose how often the action runs (for example, at each check interval, only when the alert status changes, or at a custom action interval). In this case, you must also select the specific threshold condition that affects when actions run: `Alert`, `No Data`, or `Recovered`. diff --git a/docs/en/observability/uptime-duration-anomaly-alert.asciidoc b/docs/en/observability/uptime-duration-anomaly-alert.asciidoc index cb39846a37..3ce1ea8cae 100644 --- a/docs/en/observability/uptime-duration-anomaly-alert.asciidoc +++ b/docs/en/observability/uptime-duration-anomaly-alert.asciidoc @@ -48,7 +48,7 @@ third-party systems that run as background tasks on the {kib} server when rule c You can configure action types on the <> page. -include::../shared/alerting-connectors.asciidoc[] +include::../shared/alerting-and-rules/alerting-connectors.asciidoc[] After you select a connector, you must set the action frequency. You can choose to create a summary of alerts on each check interval or on a custom interval. For example, send email notifications that summarize the new, ongoing, and recovered alerts every twelve hours: diff --git a/docs/en/observability/uptime-tls-alert.asciidoc b/docs/en/observability/uptime-tls-alert.asciidoc index f76aa1889f..94a97e7967 100644 --- a/docs/en/observability/uptime-tls-alert.asciidoc +++ b/docs/en/observability/uptime-tls-alert.asciidoc @@ -46,7 +46,7 @@ third-party systems that run as background tasks on the {kib} server when rule c You can configure action types on the <> page. -include::../shared/alerting-connectors.asciidoc[] +include::../shared/alerting-and-rules/alerting-connectors.asciidoc[] After you select a connector, you must set the action frequency. You can choose to create a summary of alerts on each check interval or on a custom interval. Alternatively, you can set the action frequency such that you choose how often the action runs (for example, at each check interval, only when the alert status changes, or at a custom action interval). In this case, you must also select the specific threshold condition that affects when actions run: `Uptime TLS Alert` or `Recovered`. For example, send a notification when an alert status changes: diff --git a/docs/en/shared/alerting-connectors.asciidoc b/docs/en/shared/alerting-and-rules/alerting-connectors.asciidoc similarity index 100% rename from docs/en/shared/alerting-connectors.asciidoc rename to docs/en/shared/alerting-and-rules/alerting-connectors.asciidoc diff --git a/docs/en/shared/alerting-and-rules/generic-apm-action-frequency.asciidoc b/docs/en/shared/alerting-and-rules/generic-apm-action-frequency.asciidoc new file mode 100644 index 0000000000..e600c0cdc9 --- /dev/null +++ b/docs/en/shared/alerting-and-rules/generic-apm-action-frequency.asciidoc @@ -0,0 +1,8 @@ +After you select a connector, you must set the action frequency. +You can choose to create a summary of alerts on each check interval or on a custom interval. +Alternatively, you can set the action frequency such that you choose how often the action runs (for example, at each check interval, only when the alert status changes, or at a custom action interval). + +You can also further refine the conditions under which actions run by specifying that actions only run they match a KQL query or when an alert occurs within a specific time frame: + +* *If alert matches query*: Enter a KQL query that defines field-value pairs or query conditions that must be met for notifications to send. The query only searches alert documents in the indices specified for the rule. +* *If alert is generated during timeframe*: Set timeframe details. Notifications are only sent if alerts are generated within the timeframe you define. diff --git a/docs/en/shared/alerting-and-rules/generic-apm-advanced-options.asciidoc b/docs/en/shared/alerting-and-rules/generic-apm-advanced-options.asciidoc new file mode 100644 index 0000000000..e5114f1adc --- /dev/null +++ b/docs/en/shared/alerting-and-rules/generic-apm-advanced-options.asciidoc @@ -0,0 +1,2 @@ +Optionally define an *Alert delay*. +An alert will only occur when the specified number of consecutive runs meet the rule conditions. \ No newline at end of file diff --git a/docs/en/shared/alerting-and-rules/generic-apm-group-by.asciidoc b/docs/en/shared/alerting-and-rules/generic-apm-group-by.asciidoc new file mode 100644 index 0000000000..8f756015b9 --- /dev/null +++ b/docs/en/shared/alerting-and-rules/generic-apm-group-by.asciidoc @@ -0,0 +1,24 @@ +Set one or more *group alerts by* fields for custom threshold rules to perform a composite aggregation against the selected fields. +When any of these groups match the selected rule conditions, an alert is triggered _per group_. + +When you select multiple groupings, the group name is separated by commas. + +When you select *Alert me if a group stops reporting data*, the rule is triggered if a group that previously +reported metrics does not report them again over the expected time period. + +.Example: Group by one field +**** +If you group alerts by the `service.name` field and there are two services (`Service A` and `Service B`), +when `Service A` matches the conditions but `Service B` doesn't, one alert is triggered for `Service A`. +If both groups match the conditions, alerts are triggered for both groups. +**** + +.Example: Group by multiple fields +**** +If you group alerts by both the `service.name` and `service.environment` fields, +and there are two services (`Service A` and `Service B`) and two environments (`Production` and `Staging`), +the composite aggregation forms multiple groups. + +If the `Service A, Production` group matches the rule conditions, but the `Service B, Staging` group doesn't, +one alert is triggered for `Service A, Production`. +**** diff --git a/docs/en/shared/alerting-and-rules/generic-apm-rule-schedule.asciidoc b/docs/en/shared/alerting-and-rules/generic-apm-rule-schedule.asciidoc new file mode 100644 index 0000000000..b326bcf587 --- /dev/null +++ b/docs/en/shared/alerting-and-rules/generic-apm-rule-schedule.asciidoc @@ -0,0 +1,2 @@ +Define how often to evaluate the condition in seconds, minutes, hours, or days. +Checks are queued so they run as close to the defined value as capacity allows. \ No newline at end of file