Skip to content

Commit

Permalink
Merge pull request #27 from CoderPoet/fix/http-metrics-high-cardinality
Browse files Browse the repository at this point in the history
feat(metrics): optimize metrics
  • Loading branch information
CoderPoet authored Nov 17, 2023
2 parents 52d1501 + 46fdbb4 commit 8671b6c
Show file tree
Hide file tree
Showing 16 changed files with 446 additions and 120 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/pr-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: 1.19
go-version: 1.21.3

- uses: actions/cache@v3
with:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: 1.19
go-version: 1.21.3
- name: Lint
uses: golangci/golangci-lint-action@v3
with:
Expand All @@ -43,7 +43,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: 1.19
go-version: 1.21.3

- uses: actions/cache@v3
with:
Expand Down
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,9 +133,10 @@ h.GET("/ping", func(c context.Context, ctx *app.RequestContext) {

Below is a table of HTTP server metric instruments.

| Name | Instrument Type | Unit | Unit | Description |
|-------------------------------|---------------------------------------------------|--------------|-------------------------------------------|------------------------------------------------------------------------------|
| `http.server.duration` | Histogram | milliseconds | `ms` | measures the duration inbound HTTP requests |
| Name | Instrument Type | Unit | Unit | Description |
|-----------------------------|-----------------|--------------|-----------|------------------------------------------------------------------------------|
| `http.server.duration` | Histogram | milliseconds | `ms`<br/> | measures the duration inbound HTTP requests |
| `http.server.request_count` | Counter | count | `count` | measures the incoming request count total |


#### Hertz Client
Expand All @@ -145,6 +146,7 @@ Below is a table of HTTP client metric instruments.
| Name | Instrument Type ([*](README.md#instrument-types)) | Unit | Unit ([UCUM](README.md#instrument-units)) | Description |
|-----------------------------|---------------------------------------------------|--------------|-------------------------------------------|----------------------------------------------------------|
| `http.client.duration` | Histogram | milliseconds | `ms` | measures the duration outbound HTTP requests |
| `http.client.request_count` | Counter | count | `count` | measures the client request count total |


### R.E.D
Expand All @@ -155,15 +157,15 @@ the number of requests, per second, you services are serving.

eg: QPS
```
sum(rate(http_server_duration_count{}[5m])) by (service_name, http_method)
sum(rate(http_server_request_count_total{}[5m])) by (service_name, http_method)
```

#### Errors
the number of failed requests per second.

eg: Error ratio
```
sum(rate(http_server_duration_count{status_code="Error"}[5m])) by (service_name, http_method) / sum(rate(http_server_duration_count{}[5m])) by (service_name, http_method)
sum(rate(http_server_request_count_total{status_code="Error"}[5m])) by (service_name, http_method) / sum(rate(http_server_request_count_total{}[5m])) by (service_name, http_method)
```

#### Duration
Expand All @@ -177,7 +179,7 @@ histogram_quantile(0.99, sum(rate(http_server_duration_bucket{}[5m])) by (le, se
### Service Topology Map
The `http.server.duration` will record the peer service and the current service dimension. Based on this dimension, we can aggregate the service topology map
```
sum(rate(http_server_duration_count{}[5m])) by (service_name, peer_service)
sum(rate(http_server_request_count_total{}[5m])) by (service_name, peer_service)
```

### Runtime Metrics
Expand Down
20 changes: 11 additions & 9 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,17 +140,19 @@ h.GET("/ping", func(c context.Context, ctx *app.RequestContext) {

下表列出了 HTTP 服务的指标

| 名称 | Instrument Type | 单位 | 单位 | 描述 |
|-------------------------------|---------------------------------------------------|--------------|-------------------------------------------|------------------------------------------------------------------------------|
| `http.server.duration` | Histogram | milliseconds | `ms` | 测量入站 HTTP 请求的耗时 |
| 名称 | Instrument Type | 单位 | 单位 | 描述 |
|-----------------------------|---------------------------------------------------|--------------|-------------------------------------------|-----------------|
| `http.server.duration` | Histogram | milliseconds | `ms` | 测量入站 HTTP 请求的耗时 |
| `http.server.request_count` | Counter | count | `count` | 测量入站 HTTP 请求数 |

#### Hertz Client

下表列出了 HTTP 客户端指标

| 名称 | Instrument Type | 单位 | 单位 (UCUM) | 描述 |
|-----------------------------|---------------------------------------------------|--------------|-------------------------------------------|----------------------------------------------------------|
| `http.client.duration` | Histogram | milliseconds | `ms` | 测量出站 HTTP 请求的耗时 |
| 名称 | Instrument Type | 单位 | 单位 (UCUM) | 描述 |
|-----------------------------|---------------------------------------------------|--------------|-------------------------------------------|-----------------|
| `http.client.duration` | Histogram | milliseconds | `ms` | 测量出站 HTTP 请求的耗时 |
| `http.client.request_count` | Counter | count | `count` | 测量出站 HTTP 请求数 |


### R.E.D
Expand All @@ -163,7 +165,7 @@ R.E.D (Rate, Errors, Duration) 定义了架构中的每个微服务测量的三
例如: QPS(Queries Per Second)每秒查询率

```
sum(rate(http_server_duration_count{}[5m])) by (service_name, http_method)
sum(rate(http_server_request_count_total{}[5m])) by (service_name, http_method)
```

#### Errors
Expand All @@ -173,7 +175,7 @@ sum(rate(http_server_duration_count{}[5m])) by (service_name, http_method)
例如:错误率

```
sum(rate(http_server_duration_count{status_code="Error"}[5m])) by (service_name, http_method) / sum(rate(http_server_duration_count{}[5m])) by (service_name, http_method)
sum(rate(http_server_request_count_total{status_code="Error"}[5m])) by (service_name, http_method) / sum(rate(http_server_request_count_total{}[5m])) by (service_name, http_method)
```

#### Duration
Expand All @@ -190,7 +192,7 @@ histogram_quantile(0.99, sum(rate(http_server_duration_bucket{}[5m])) by (le, se

`http.server.duration`将记录对等服务和当前服务维度。基于这个维度,我们可以汇总生成服务拓扑图
```
sum(rate(http_server_duration_count{}[5m])) by (service_name, peer_service)
sum(rate(http_server_request_count_total{}[5m])) by (service_name, peer_service)
```

### Runtime Metrics
Expand Down
29 changes: 29 additions & 0 deletions testutil/go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
module github.com/hertz-contrib/obs-opentelemetry/testutil

go 1.21

require (
github.com/prometheus/client_golang v1.17.0
go.opentelemetry.io/otel v1.20.0
go.opentelemetry.io/otel/exporters/prometheus v0.43.0
go.opentelemetry.io/otel/exporters/stdout/stdouttrace v1.20.0
go.opentelemetry.io/otel/metric v1.20.0
go.opentelemetry.io/otel/sdk v1.20.0
go.opentelemetry.io/otel/sdk/metric v1.20.0
)

require (
github.com/beorn7/perks v1.0.1 // indirect
github.com/cespare/xxhash/v2 v2.2.0 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/go-logr/logr v1.3.0 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/golang/protobuf v1.5.3 // indirect
github.com/matttproud/golang_protobuf_extensions v1.0.4 // indirect
github.com/prometheus/client_model v0.5.0 // indirect
github.com/prometheus/common v0.44.0 // indirect
github.com/prometheus/procfs v0.11.1 // indirect
go.opentelemetry.io/otel/trace v1.20.0 // indirect
golang.org/x/sys v0.14.0 // indirect
google.golang.org/protobuf v1.31.0 // indirect
)
56 changes: 56 additions & 0 deletions testutil/go.sum
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw=
github.com/cespare/xxhash/v2 v2.2.0 h1:DC2CZ1Ep5Y4k3ZQ899DldepgrayRUGE6BBZ/cd9Cj44=
github.com/cespare/xxhash/v2 v2.2.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/go-logr/logr v1.2.2/go.mod h1:jdQByPbusPIv2/zmleS9BjJVeZ6kBagPoEUsqbVz/1A=
github.com/go-logr/logr v1.3.0 h1:2y3SDp0ZXuc6/cjLSZ+Q3ir+QB9T/iG5yYRXqsagWSY=
github.com/go-logr/logr v1.3.0/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY=
github.com/go-logr/stdr v1.2.2 h1:hSWxHoqTgW2S2qGc0LTAI563KZ5YKYRhT3MFKZMbjag=
github.com/go-logr/stdr v1.2.2/go.mod h1:mMo/vtBO5dYbehREoey6XUKy/eSumjCCveDpRre4VKE=
github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
github.com/golang/protobuf v1.5.0/go.mod h1:FsONVRAS9T7sI+LIUmWTfcYkHO4aIWwzhcaSAoJOfIk=
github.com/golang/protobuf v1.5.3 h1:KhyjKVUg7Usr/dYsdSqoFveMYd5ko72D+zANwlG1mmg=
github.com/golang/protobuf v1.5.3/go.mod h1:XVQd3VNwM+JqD3oG2Ue2ip4fOMUkwXdXDdiuN0vRsmY=
github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI=
github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
github.com/matttproud/golang_protobuf_extensions v1.0.4 h1:mmDVorXM7PCGKw94cs5zkfA9PSy5pEvNWRP0ET0TIVo=
github.com/matttproud/golang_protobuf_extensions v1.0.4/go.mod h1:BSXmuO+STAnVfrANrmjBb36TMTDstsz7MSK+HVaYKv4=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/prometheus/client_golang v1.17.0 h1:rl2sfwZMtSthVU752MqfjQozy7blglC+1SOtjMAMh+Q=
github.com/prometheus/client_golang v1.17.0/go.mod h1:VeL+gMmOAxkS2IqfCq0ZmHSL+LjWfWDUmp1mBz9JgUY=
github.com/prometheus/client_model v0.5.0 h1:VQw1hfvPvk3Uv6Qf29VrPF32JB6rtbgI6cYPYQjL0Qw=
github.com/prometheus/client_model v0.5.0/go.mod h1:dTiFglRmd66nLR9Pv9f0mZi7B7fk5Pm3gvsjB5tr+kI=
github.com/prometheus/common v0.44.0 h1:+5BrQJwiBB9xsMygAB3TNvpQKOwlkc25LbISbrdOOfY=
github.com/prometheus/common v0.44.0/go.mod h1:ofAIvZbQ1e/nugmZGz4/qCb9Ap1VoSTIO7x0VV9VvuY=
github.com/prometheus/procfs v0.11.1 h1:xRC8Iq1yyca5ypa9n1EZnWZkt7dwcoRPQwX/5gwaUuI=
github.com/prometheus/procfs v0.11.1/go.mod h1:eesXgaPo1q7lBpVMoMy0ZOFTth9hBn4W/y0/p/ScXhY=
github.com/stretchr/testify v1.8.4 h1:CcVxjf3Q8PM0mHUKJCdn+eZZtm5yQwehR5yeSVQQcUk=
github.com/stretchr/testify v1.8.4/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo=
go.opentelemetry.io/otel v1.20.0 h1:vsb/ggIY+hUjD/zCAQHpzTmndPqv/ml2ArbsbfBYTAc=
go.opentelemetry.io/otel v1.20.0/go.mod h1:oUIGj3D77RwJdM6PPZImDpSZGDvkD9fhesHny69JFrs=
go.opentelemetry.io/otel/exporters/prometheus v0.43.0 h1:Skkl6akzvdWweXX6LLAY29tyFSO6hWZ26uDbVGTDXe8=
go.opentelemetry.io/otel/exporters/prometheus v0.43.0/go.mod h1:nZStMoc1H/YJpRjSx9IEX4abBMekORTLQcTUT1CgLkg=
go.opentelemetry.io/otel/exporters/stdout/stdouttrace v1.20.0 h1:4s9HxB4azeeQkhY0GE5wZlMj4/pz8tE5gx2OQpGUw58=
go.opentelemetry.io/otel/exporters/stdout/stdouttrace v1.20.0/go.mod h1:djVA3TUJ2fSdMX0JE5XxFBOaZzprElJoP7fD4vnV2SU=
go.opentelemetry.io/otel/metric v1.20.0 h1:ZlrO8Hu9+GAhnepmRGhSU7/VkpjrNowxRN9GyKR4wzA=
go.opentelemetry.io/otel/metric v1.20.0/go.mod h1:90DRw3nfK4D7Sm/75yQ00gTJxtkBxX+wu6YaNymbpVM=
go.opentelemetry.io/otel/sdk v1.20.0 h1:5Jf6imeFZlZtKv9Qbo6qt2ZkmWtdWx/wzcCbNUlAWGM=
go.opentelemetry.io/otel/sdk v1.20.0/go.mod h1:rmkSx1cZCm/tn16iWDn1GQbLtsW/LvsdEEFzCSRM6V0=
go.opentelemetry.io/otel/sdk/metric v1.20.0 h1:5eD40l/H2CqdKmbSV7iht2KMK0faAIL2pVYzJOWobGk=
go.opentelemetry.io/otel/sdk/metric v1.20.0/go.mod h1:AGvpC+YF/jblITiafMTYgvRBUiwi9hZf0EYE2E5XlS8=
go.opentelemetry.io/otel/trace v1.20.0 h1:+yxVAPZPbQhbC3OfAkeIVTky6iTFpcr4SiY9om7mXSQ=
go.opentelemetry.io/otel/trace v1.20.0/go.mod h1:HJSK7F/hA5RlzpZ0zKDCHCDHm556LCDtKaAo6JmBFUU=
golang.org/x/sync v0.0.0-20181221193216-37e7f081c4d4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sys v0.14.0 h1:Vz7Qs629MkJkGyHxUlRHizWJRG2j8fbQKjELVSNhy7Q=
golang.org/x/sys v0.14.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
google.golang.org/protobuf v1.26.0-rc.1/go.mod h1:jlhhOSvTdKEhbULTjvd4ARK9grFBp09yW+WbY/TyQbw=
google.golang.org/protobuf v1.26.0/go.mod h1:9q0QmTI4eRPtz6boOQmLYwt+qCgq0jsYwAQnmE0givc=
google.golang.org/protobuf v1.31.0 h1:g0LDEJHgrBl9N9r17Ru3sqWhkIx2NB67okBHPwC7hs8=
google.golang.org/protobuf v1.31.0/go.mod h1:HV8QOd/L58Z+nl8r43ehVNZIU/HEI6OcFqwMG9pJV4I=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
108 changes: 108 additions & 0 deletions testutil/otel.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
// Copyright 2022 CloudWeGo Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package testutil

import (
"os"

"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/testutil"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/propagation"
"go.opentelemetry.io/otel/sdk/metric"
"go.opentelemetry.io/otel/sdk/resource"

otelprom "go.opentelemetry.io/otel/exporters/prometheus"
stdout "go.opentelemetry.io/otel/exporters/stdout/stdouttrace"
otelmetric "go.opentelemetry.io/otel/metric"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.21.0"
)

// OtelTestProvider get otel test provider
func OtelTestProvider() (*sdktrace.TracerProvider, otelmetric.MeterProvider, *prometheus.Registry) {
// prometheus registry
registry := prometheus.NewRegistry()

// init tracer
tracerProvider, err := initTracer()
if err != nil {
panic(err)
}

meterProvider, err := initMeterProvider(registry)
if err != nil {
panic(err)
}

return tracerProvider, meterProvider, registry
}

// GatherAndCompare compare metrics with registry
func GatherAndCompare(registry *prometheus.Registry, expectedFilePath string, metricName ...string) error {
file, err := os.Open(expectedFilePath)
if err != nil {
return err
}
defer func(file *os.File) {
_ = file.Close()
}(file)

err = testutil.GatherAndCompare(registry, file, metricName...)
if err != nil {
return err
}
return nil
}

func initMeterProvider(registry *prometheus.Registry) (otelmetric.MeterProvider, error) {
exporter, err := initMetricExporter(registry)
if err != nil {
return nil, err
}
provider := metric.NewMeterProvider(metric.WithReader(exporter))
return provider, nil
}

func initMetricExporter(registry *prometheus.Registry) (*otelprom.Exporter, error) {
return otelprom.New(
otelprom.WithRegisterer(registry),
)
}

func initTracer() (*sdktrace.TracerProvider, error) {
// Create stdout exporter to be able to retrieve
// the collected spans.
exporter, err := stdout.New(stdout.WithPrettyPrint())
if err != nil {
return nil, err
}

// For the demonstration, use sdktrace.AlwaysSample sampler to sample all traces.
// In a production application, use sdktrace.ProbabilitySampler with a desired probability.
tp := sdktrace.NewTracerProvider(
sdktrace.WithSampler(sdktrace.AlwaysSample()),
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceNameKey.String("test-server"),
semconv.ServiceNamespaceKey.String("test-ns"),
semconv.DeploymentEnvironmentKey.String("test-env"),
)),
)
otel.SetTracerProvider(tp)
otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))
return tp, err
}
67 changes: 67 additions & 0 deletions tracing/example_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
// Copyright 2022 CloudWeGo Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package tracing_test

import (
"context"
"testing"
"time"

"github.com/cloudwego/hertz/pkg/app"
"github.com/cloudwego/hertz/pkg/app/client"
"github.com/cloudwego/hertz/pkg/app/server"
"github.com/cloudwego/hertz/pkg/common/hlog"
"github.com/cloudwego/hertz/pkg/protocol/consts"
"github.com/hertz-contrib/obs-opentelemetry/testutil"
hertztracing "github.com/hertz-contrib/obs-opentelemetry/tracing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"go.opentelemetry.io/otel"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
)

func TestMetricsExample(t *testing.T) {
// test util
tracerProvider, meterProvider, registry := testutil.OtelTestProvider()
defer func(tracerProvider *sdktrace.TracerProvider, ctx context.Context) {
_ = tracerProvider.Shutdown(ctx)
}(tracerProvider, context.Background())
otel.SetMeterProvider(meterProvider)

// server example
tracer, cfg := hertztracing.NewServerTracer()
h := server.Default(tracer, server.WithHostPorts(":39888"))
h.Use(hertztracing.ServerMiddleware(cfg))
h.GET("/ping", func(c context.Context, ctx *app.RequestContext) {
hlog.CtxDebugf(c, "message received successfully")
ctx.JSON(consts.StatusOK, "pong")
})
go h.Spin()

<-time.After(time.Millisecond * 500)

// client example
c, _ := client.NewClient()
c.Use(hertztracing.ClientMiddleware())
_, body, err := c.Get(context.Background(), nil, "http://localhost:39888/ping?foo=bar")
require.NoError(t, err)
assert.NotNil(t, body)

// diff metrics
assert.NoError(t, testutil.GatherAndCompare(
registry, "testdata/hertz_request_metrics.txt",
"http_server_request_count_total", "http_client_request_count_total"),
)
}
Loading

0 comments on commit 8671b6c

Please sign in to comment.