Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Update metrics query docs with examples, more details #4248

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 12 additions & 9 deletions docs/sources/tempo/api_docs/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ For externally supported GRPC API, [see below](#tempo-grpc-api).
| [Search tag names V2](#search-tags-v2) | Query-frontend | HTTP | `GET /api/v2/search/tags` |
| [Search tag values](#search-tag-values) | Query-frontend | HTTP | `GET /api/search/tag/<tag>/values` |
| [Search tag values V2](#search-tag-values-v2) | Query-frontend | HTTP | `GET /api/v2/search/tag/<tag>/values` |
| [TraceQL Metrics](#traceql-metrics) | Query-frontend | HTTP | `GET /api/metrics/query_range` |
| [TraceQL Metrics](#traceql-metrics) | Query-frontend | HTTP | `GET /api/metrics/query_range` |
| [TraceQL Metrics (instant)](#instant) | Query-frontend | HTTP | `GET /api/metrics/query` |
| [Query Echo Endpoint](#query-echo-endpoint) | Query-frontend | HTTP | `GET /api/echo` |
| [Overrides API](#overrides-api) | Query-frontend | HTTP | `GET,POST,PATCH,DELETE /api/overrides` |
Expand Down Expand Up @@ -312,8 +312,9 @@ $ curl -G -s http://localhost:3200/api/search --data-urlencode 'tags=service.nam

Ingester configuration `complete_block_timeout` affects how long tags are available for search.

This endpoint retrieves all discovered tag names that can be used in search. The endpoint is available in the query frontend service in
a microservices deployment, or the Tempo endpoint in a monolithic mode deployment. The tags endpoint takes a scope that controls the kinds
This endpoint retrieves all discovered tag names that can be used in search.
The endpoint is available in the query frontend service in a microservices deployment, or the Tempo endpoint in a monolithic mode deployment.
The tags endpoint takes a scope that controls the kinds
of tags or attributes returned. If nothing is provided, the endpoint returns all resource and span tags.

```
Expand Down Expand Up @@ -584,7 +585,9 @@ If a particular service name (for example, `shopping-cart`) is only present on s

### TraceQL Metrics

The TraceQL Metrics API returns Prometheus-like time-series for a given metrics query. Metrics queries are those using metrics functions like `rate()` and `quantile_over_time()`. See the [documentation]({{< relref "../traceql/metrics-queries" >}}) for the complete list.
The TraceQL Metrics API returns Prometheus-like time-series for a given metrics query.
Metrics queries are those using metrics functions like `rate()` and `quantile_over_time()`.
Refer to the [TraceQL metrics documentation](https://grafana.com/docs/tempo/<TEMPO_VERSION>/traceql/metrics-queries/) for more information list.

Parameters:

Expand All @@ -595,20 +598,20 @@ Parameters:
- `end = (unix epoch seconds | unix epoch nanoseconds | RFC3339 string)`
Optional. Along with `start` define the time range. Providing both `start` and `end` includes blocks for the specified time range only.
- `since = (duration string)`
Optional. Can be used instead of `start` and `end` to define the time range in relative values. For example `since=15m` will query the last 15 minutes. Default is last 1 hour.
Optional. Can be used instead of `start` and `end` to define the time range in relative values. For example, `since=15m` queries the last 15 minutes. Default is the last 1 hour.
- `step = (duration string)`
Optional. Defines the granularity of the returned time-series. For example `step=15s` will return a data point every 15s within the time range. If not specified then the default behavior will choose a dynamic step based on the time range.
Optional. Defines the granularity of the returned time-series. For example, `step=15s` returns a data point every 15s within the time range. If not specified, then the default behavior chooses a dynamic step based on the time range.
- `exemplars = (integer)`
Optional. Defines the maximun number of exemplars for the query. It will be trimmed to max_exemplars if exceed it.

The API is available in the query frontend service in
a microservices deployment, or the Tempo endpoint in a monolithic mode deployment.

For example the following request computes the rate of spans received for `myservice` over the last three hours, at 1 minute intervals.
For example, the following request computes the rate of spans received for `myservice` over the last three hours, at 1 minute intervals.

{{< admonition type="note" >}}
Actual API parameters must be url-encoded. This example is left unencoded for readability.
{{% /admonition %}}
{{< /admonition >}}

```
GET /api/metrics/query_range?q={resource.service.name="myservice"} | min_over_time() with(exemplars=true) &since=3h&step=1m&exemplars=100
Expand Down Expand Up @@ -855,6 +858,6 @@ service StreamingQuerier {
rpc SearchTagsV2(SearchTagsRequest) returns (stream SearchTagsV2Response) {}
rpc SearchTagValues(SearchTagValuesRequest) returns (stream SearchTagValuesResponse) {}
rpc SearchTagValuesV2(SearchTagValuesRequest) returns (stream SearchTagValuesV2Response) {}
rpc MetricsQueryRange(QueryRangeRequest) returns (stream QueryRangeResponse) {}
rpc MetricsQueryRange(QueryRangeRequest) returns (stream QueryRangeResponse) {}
}
```
2 changes: 1 addition & 1 deletion docs/sources/tempo/metrics-generator/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ weight: 500
# Metrics-generator

Metrics-generator is an optional Tempo component that derives metrics from ingested traces.
If present, the distributors write received spans to both the ingester and the metrics-generator.
If present, the distributor writes received spans to both the ingester and the metrics-generator.
The metrics-generator processes spans and writes metrics to a Prometheus data source using the Prometheus remote write protocol.

## Architecture
Expand Down
40 changes: 40 additions & 0 deletions docs/sources/tempo/operations/traceql-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,46 @@ Setting `flush_to_storage` to `true` ensures that metrics blocks are flushed to

For more information about overrides, refer to [Standard overrides](https://grafana.com/docs/tempo/<TEMPO_VERSION>/configuration/#standard-overrides).

```yaml
overrides:
'tenantID':
metrics_generator_processors:
- local-blocks
```

By default, for all tenants in the main configuration:

```yaml
overrides:
defaults:
metrics_generator:
processors: [local-blocks]
```

Add this configuration to run TraceQL metrics queries against all spans (and not just server spans):

```yaml
metrics_generator:
processor:
local_blocks:
filter_server_spans: false
```

If you configured Tempo using the `tempo-distributed` Helm chart, you can also set `traces_storage` using your `values.yaml` file.
Refer to the [Helm chart for an example](https://github.com/grafana/helm-charts/blob/559ecf4a9c9eefac4521454e7a8066778e4eeff7/charts/tempo-distributed/values.yaml#L362).

```yaml
metrics_generator:
processor:
local_blocks:
flush_to_storage: true
```

Setting `flush_to_storage` to `true` ensures that metrics blocks are flushed to storage so TraceQL metrics queries against historical data.

For more information about overrides, refer to [Standard overrides](https://grafana.com/docs/tempo/<TEMPO_VERSION>/configuration/#standard-overrides).


## Evaluate query timeouts

Because of their expensive nature, these queries can take a long time to run.
Expand Down
87 changes: 87 additions & 0 deletions docs/sources/tempo/traceql/metrics-queries/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
title: TraceQL metrics queries
menuTitle: TraceQL metrics queries
description: Learn about TraceQL metrics queries
weight: 600
keywords:
- metrics query
- TraceQL metrics
---

# TraceQL metrics queries

{{< docs/experimental product="TraceQL metrics" >}}

TraceQL metrics is an experimental feature in Grafana Tempo that creates metrics from traces.

Metric queries extend trace queries by applying a function to trace query results.
This powerful feature allows for ad hoc aggregation of any existing TraceQL query by any dimension available in your traces, much in the same way that LogQL metric queries create metrics from logs.

Traces are a unique observability signal that contain causal relationships between the components in your system.

TraceQL metrics can help answer questions like this:

* How many database calls across all systems are downstream of your application?
* What services beneath a given endpoint are currently failing?
* What services beneath an endpoint are currently slow?

TraceQL metrics can help you answer these questions by parsing your traces in aggregate.

TraceQL metrics are powered by the [TraceQL metrics API](https://grafana.com/docs/tempo/<TEMPO_VERSION>/api_docs/#traceql-metrics).

![Metrics visualization in Grafana](/media/docs/tempo/metrics-explore-sample-2.4.png)

## RED metrics, TraceQL, and PromQL

RED is an acronym for three types of metrics:

- Rate, the number of requests per second
- Errors, the number of those requests that are failing
- Duration, the amount of time those requests take

For more information about the RED method, refer to [The RED Method: how to instrument your services](/blog/2018/08/02/the-red-method-how-to-instrument-your-services/).

You can write TraceQL metrics queries to compute rate, errors, and durations over different groups of spans.

For more information on how to use TraceQL metrics to investigate issues, refer to [Solve problems with metrics queries](./solve-problems-metrics-queries).

## Enable and use TraceQL metrics

To use TraceQL metrics, you need to enable them on your Tempo database.
Refer to [Configure TraceQL metrics](https://grafana.com/docs/tempo/<TEMPO_VERSION>/operations/traceql-metrics/) for more information.

From there, you can either query the TraceQL metrics API directly (for example, with `curl`) or using Grafana
(recommended).
To run TraceQL metrics queries in Grafana, you need Grafana Cloud or Grafana 10.4 or later.
No extra configuration is needed.
Use a Tempo data source that points to a Tempo database with TraceQL metrics enabled.

Refer to [Solve problems using metrics queries](./solve-problems-metrics-queries/) for some real-world examples.

### Functions

TraceQL metrics queries currently include the following functions for aggregating over groups of spans: `rate`, `count_over_time`, `quantile_over_time`, `histogram_over_time`, and `compare`.
These functions can be added as an operator at the end of any TraceQL query.

For detailed information and example queries for each function, refer to [TraceQL metrics functions](./functions).

### Exemplars

Exemplars are a powerful feature of TraceQL metrics.
They allow you to see an exact trace that contributed to a given metric value.
This is particularly useful when you want to understand why a given metric is high or low.

Exemplars are available in TraceQL metrics for all range queries.
To get exemplars, you need to configure it in the query-frontend with the parameter `query_frontend.metrics.max_exemplars`,
or pass a query hint in your query.

Example:

```
{ span:name = "GET /:endpoint" } | quantile_over_time(duration, .99) by (span.http.target) with (exemplars=true)
```

{{< admonition type="note" >}}
TraceQL metric queries with exemplars aren't fully supported in Grafana Explore.
They will be supported in a future Grafana release.
{{< /admonition >}}
Comment on lines +84 to +87
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This admonition is new and needs to be reviewed.

Loading
Loading