Skip to content

Commit

Permalink
Replace event.batch_id with event.metric_names
Browse files Browse the repository at this point in the history
The `event.batch_id` with its random values is a wrong choice as a
dimension field for a time series database. It would create a new
time series at each iteration, which is terrible.

The `event.metric_names` will keep the values to a recurring set of
field names all having the same ingest delay.
  • Loading branch information
zmoog committed Oct 16, 2023
1 parent 2e3cd08 commit 71c0aa0
Show file tree
Hide file tree
Showing 5 changed files with 52 additions and 40 deletions.
4 changes: 2 additions & 2 deletions metricbeat/docs/fields.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -35303,10 +35303,10 @@ GCP module



*`event.batch_id`*::
*`event.metric_names`*::
+
--
The ID of the batch of events created during metrics collection. Every <period> seconds, the metricset fetches new metrics values from GCP and makes a new batch of events. The batch ID is a UUID, for example, 8f7a8c7c-ff6f-11e9-8f0b-0242ac120005.
The ID of the batch of events created during metrics collection. Every <period> seconds, the metricset fetches new metrics values from GCP and makes a new batch of events. The batch ID is a UUID, for example, 8f7a8c7c-ff6f-11e9-8f0b-0242ac120005. The comma-separated list metric names collected in the batch. For example, l3.external.ingress_packets.count,l3.external.ingress.bytes. Required to support TSDB.


type: keyword
Expand Down
3 changes: 2 additions & 1 deletion x-pack/metricbeat/module/gcp/_meta/fields.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,11 @@
- name: event
type: group
fields:
- name: batch_id
- name: metric_names
type: keyword
description: >
The ID of the batch of events created during metrics collection. Every <period> seconds, the metricset fetches new metrics values from GCP and makes a new batch of events. The batch ID is a UUID, for example, 8f7a8c7c-ff6f-11e9-8f0b-0242ac120005.
The comma-separated list metric names collected in the batch. For example, l3.external.ingress_packets.count,l3.external.ingress.bytes. Required to support TSDB.
- name: gcp
type: group
fields:
Expand Down
2 changes: 1 addition & 1 deletion x-pack/metricbeat/module/gcp/fields.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

35 changes: 1 addition & 34 deletions x-pack/metricbeat/module/gcp/metrics/metricset.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ import (
"strings"
"time"

"github.com/google/uuid"

monitoring "cloud.google.com/go/monitoring/apiv3/v2"
"cloud.google.com/go/monitoring/apiv3/v2/monitoringpb"
"github.com/golang/protobuf/ptypes/duration"
Expand Down Expand Up @@ -226,39 +224,8 @@ func (m *MetricSet) mapToEvents(ctx context.Context, timeSeries []timeSeriesWith
// Group the time series values by common traits.
timeSeriesGroups := m.groupTimeSeries(ctx, timeSeries, metadataService, mapper)

// Generate a batch ID for all events collected in this collection.
//
// Why do we need keep track in which batch the metricset collected the event?
// ---------------------------------------------------------------------------
//
// GCP metrics have different ingestion delays; after GCP collects a metric from
// a resource, it takes some time to be available for ingestion.
//
// Some metrics have no ingestion delay, while others have a delay of up to multiple
// minutes.
//
// For example,
// - `container/memory.limit.bytes` has no ingest delay, while
// - `container/memory/request_bytes` has two minutes ingest delay.
//
// So, even if the metricset collects these metrics at two minutes apart, the metrics
// will have the same timestamp.
//
// When metrics have the same timestamp and dimensions, the metricset will group them
// into a single event. However, the metricset cannot group metrics collected at different
// times.
//
// The metricset cannot group the events from different collections, so we need
// to add an `event.batch_id` field to avoid having two documents with the same
// timestamp and dimensions.
//
eventBatchID, err := uuid.NewUUID()
if err != nil {
return nil, fmt.Errorf("error generating batch ID: %w", err)
}

// Create single events for each time series group.
events := createEventsFromGroups(sdc.ServiceName, timeSeriesGroups, eventBatchID.String())
events := createEventsFromGroups(sdc.ServiceName, timeSeriesGroups)

return events, nil
}
Expand Down
48 changes: 46 additions & 2 deletions x-pack/metricbeat/module/gcp/metrics/timeseries.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ package metrics
import (
"context"
"fmt"
"strings"

"github.com/elastic/beats/v7/metricbeat/mb"
"github.com/elastic/beats/v7/x-pack/metricbeat/module/gcp"
Expand Down Expand Up @@ -74,7 +75,7 @@ func groupMetricsByDimensions(keyValues []KeyValuePoint) map[string][]KeyValuePo
// Collapsing the metrics in each group into a single event should not cause
// any loss of information, since all metrics in a group share the same timestamp
// and dimensions.
func createEventsFromGroups(service string, groups map[string][]KeyValuePoint, eventBatchID string) []mb.Event {
func createEventsFromGroups(service string, groups map[string][]KeyValuePoint) []mb.Event {
events := make([]mb.Event, 0, len(groups))

for _, group := range groups {
Expand All @@ -86,8 +87,51 @@ func createEventsFromGroups(service string, groups map[string][]KeyValuePoint, e
MetricSetFields: mapstr.M{},
}

// Collect the metric names in the event and add them to the event
// as `event.metric_names` field.
//
// Why do we need keep track of all the metric names in the event?
// ===============================================================
//
// Context
// -------
//
// GCP metrics have different ingestion delays; some metrics have zero delay,
// while others have a non-zero delay of up to a few minutes.
//
// For example,
// - `container/memory.limit.bytes` has no ingest delay, while
// - `container/memory/request_bytes` has two minutes ingest delay.
//
// Since the metricset collects metrics every 60 seconds, the metricset collects
// `container/memory.limit.bytes` and `container/memory/request_bytes`
// in different iterations, even if they have the same timestamp.
//
// Problem
// -------
//
// When TSDB is enabled, two documents cannot have the same timestamp and dimensions.
// If they do, the second document is dropped.
//
// Unfortunately, this is exactly what happens when the metricset collects
// `container/memory.limit.bytes` and `container/memory/request_bytes` in different
// iterations.
//
// Solution
// --------
//
// Since the metricset collects different metrics in different iterations, we need
// to add an `event.metric_names` field to make sure that the events have different
// dimensions.
//
metricNames := []string{}

for _, singleEvent := range group {
// Add the metric values to the event.
_, _ = event.MetricSetFields.Put(singleEvent.Key, singleEvent.Value)

// Add the metric name to build the `event.metric_names` field.
metricNames = append(metricNames, singleEvent.Key)
}

if service == "compute" {
Expand All @@ -96,7 +140,7 @@ func createEventsFromGroups(service string, groups map[string][]KeyValuePoint, e
event.RootFields = group[0].ECS
}

_, _ = event.RootFields.Put("event.batch_id", eventBatchID)
_, _ = event.RootFields.Put("event.metric_names", strings.Join(metricNames, ","))

events = append(events, event)
}
Expand Down

0 comments on commit 71c0aa0

Please sign in to comment.