Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move proxy metrics to a separate server #352

Merged
merged 18 commits into from
Oct 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion build/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,6 @@ COPY build/_output/bin/registration-service ${REG_SERVICE}

ENTRYPOINT ["/usr/local/bin/registration-service"]

EXPOSE 8080 8081
EXPOSE 8080 8081 8082

USER ${USER_UID}
9 changes: 6 additions & 3 deletions cmd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import (
"context"
"fmt"
"github.com/codeready-toolchain/registration-service/pkg/metrics"
"github.com/prometheus/client_golang/prometheus"
"net/http"
"os"
"os/signal"
Expand Down Expand Up @@ -98,10 +99,12 @@ func main() {
}

// Register metrics
metrics.RegisterCustomMetrics()
proxyMetrics := metrics.NewProxyMetrics(prometheus.NewRegistry())
// Start metrics server
metricsSrv := proxyMetrics.StartMetricsServer()

// Start the proxy server
p, err := proxy.NewProxy(app)
p, err := proxy.NewProxy(app, proxyMetrics)
if err != nil {
panic(errs.Wrap(err, "failed to create proxy"))
}
Expand Down Expand Up @@ -141,7 +144,7 @@ func main() {
}
}()

gracefulShutdown(configuration.GracefulTimeout, srv.HTTPServer(), proxySrv)
gracefulShutdown(configuration.GracefulTimeout, srv.HTTPServer(), proxySrv, metricsSrv)
}

func gracefulShutdown(timeout time.Duration, hs ...*http.Server) {
Expand Down
19 changes: 19 additions & 0 deletions deploy/registration-service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ objects:
ports:
- containerPort: 8080
- containerPort: 8081
- containerPort: 8082
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is used only in dev environments. The one which is used in production (and e2e tests) are defined here: https://github.com/codeready-toolchain/host-operator/blob/master/deploy/registration-service/registration-service.yaml
Please create a PR to update that file too.

But what I'm wonder is why the e2e-test still pass? Do we have any e2e tests for this metric service? Without updating https://github.com/codeready-toolchain/host-operator/blob/master/deploy/registration-service/registration-service.yaml the e2e-tests for this PR should fail...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we also add names to the container ports (see for example https://kubernetes.io/docs/concepts/services-networking/service/#field-spec-ports) to make them more explicit, and reuse these names in the services as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is used only in dev environments.

It's actually not used even in dev environment. This file is a legacy thing from the time when we were able to deploy reg-service on its own - we don't do that anymore (or at least I'm not aware of anyone who would be doing that)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but anyway, Alexey is right that you need to create PR in host-repo with this change. Let's do it in two steps:

  1. create PR in host-operator repo changing only the values in the template (I don't think that we verify the actual content of the deployment in e2e tests)
  2. when the host-operator PR is merged, create a PR in e2e tests that will be paired with this reg-service one that will verify that the Service is created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply,
@MatousJobanek @alexeykazakov Thanks for your comments and catching the missing host-repo changes. I did have the changes in host-repo, that's how i was able to test it locally indeed, but had missed creating the PR on host-repo. I've updated the host-operator and e2e PR in the description.

command:
- registration-service
imagePullPolicy: IfNotPresent
Expand Down Expand Up @@ -214,6 +215,24 @@ objects:
tls:
termination: edge
wildcardPolicy: None
- kind: Service
apiVersion: v1
metadata:
name: proxy-metrics-service
namespace: ${NAMESPACE}
labels:
provider: codeready-toolchain
run: registration-service
spec:
ports:
- name: "8082"
protocol: TCP
port: 80
targetPort: 8082
selector:
run: registration-service
type: ClusterIP
sessionAffinity: null
parameters:
- name: NAMESPACE
value: 'toolchain-host-operator'
Expand Down
67 changes: 35 additions & 32 deletions pkg/metrics/metrics.go
Original file line number Diff line number Diff line change
@@ -1,48 +1,45 @@
package metrics

import (
"errors"
"github.com/labstack/echo/v4"
glog "github.com/labstack/gommon/log"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
"net/http"
logf "sigs.k8s.io/controller-runtime/pkg/log"
)

var log = logf.Log.WithName("registration_metrics")
var Reg *prometheus.Registry

const (
MetricLabelRejected = "Rejected"
MetricsLabelVerbGet = "Get"
MetricsLabelVerbList = "List"
MetricsPort = "8082"
)

// histogram with labels
var (
type ProxyMetrics struct {
// RegServProxyAPIHistogramVec measures the time taken by proxy before forwarding the request
RegServProxyAPIHistogramVec *prometheus.HistogramVec
// RegServWorkspaceHistogramVec measures the response time for either response or error from proxy when there is no routing
RegServWorkspaceHistogramVec *prometheus.HistogramVec
)

// collections
var (
allHistogramVecs = []*prometheus.HistogramVec{}
)
Reg *prometheus.Registry
}

const metricsPrefix = "sandbox_"

func init() {
initMetrics()
}
func initMetrics() {
log.Info("initializing custom metrics")
RegServProxyAPIHistogramVec = newHistogramVec("proxy_api_http_request_time", "time taken by proxy to route to a target cluster", "status_code", "route_to")
RegServWorkspaceHistogramVec = newHistogramVec("proxy_workspace_http_request_time", "time for response of a request to proxy ", "status_code", "kube_verb")
log.Info("custom metrics initialized")
}

// Reset resets all metrics. For testing purpose only!
func Reset() {
log.Info("resetting custom metrics")
initMetrics()
func NewProxyMetrics(reg *prometheus.Registry) *ProxyMetrics {
regServProxyAPIHistogramVec := newHistogramVec("proxy_api_http_request_time", "time taken by proxy to route to a target cluster", "status_code", "route_to")
regServWorkspaceHistogramVec := newHistogramVec("proxy_workspace_http_request_time", "time for response of a request to proxy ", "status_code", "kube_verb")
metrics := &ProxyMetrics{
RegServWorkspaceHistogramVec: regServWorkspaceHistogramVec,
RegServProxyAPIHistogramVec: regServProxyAPIHistogramVec,
Reg: reg,
}
metrics.Reg.MustRegister(metrics.RegServProxyAPIHistogramVec)
metrics.Reg.MustRegister(metrics.RegServWorkspaceHistogramVec)
return metrics
}

func newHistogramVec(name, help string, labels ...string) *prometheus.HistogramVec {
Expand All @@ -51,16 +48,22 @@
Help: help,
Buckets: []float64{0.05, 0.1, 0.25, 0.5, 1, 5, 10},
}, labels)
allHistogramVecs = append(allHistogramVecs, v)
return v
}

// RegisterCustomMetrics registers the custom metrics
func RegisterCustomMetrics() {
Reg = prometheus.NewRegistry()
// register metrics
for _, v := range allHistogramVecs {
Reg.MustRegister(v)
}
log.Info("custom metrics registered")
func (p *ProxyMetrics) StartMetricsServer() *http.Server {
// start server
srv := echo.New()
srv.Logger.SetLevel(glog.INFO)
srv.GET("/metrics", echo.WrapHandler(promhttp.HandlerFor(p.Reg, promhttp.HandlerOpts{DisableCompression: true, Registry: p.Reg})))

log.Info("Starting the Registration-Service Metrics server...")
// listen concurrently to allow for graceful shutdown
go func() {
if err := srv.Start(":" + MetricsPort); !errors.Is(err, http.ErrServerClosed) {
log.Error(err, err.Error())
}

Check warning on line 65 in pkg/metrics/metrics.go

View check run for this annotation

Codecov / codecov/patch

pkg/metrics/metrics.go#L64-L65

Added lines #L64 - L65 were not covered by tests
}()

return srv.Server
}
45 changes: 35 additions & 10 deletions pkg/metrics/metrics_test.go
Original file line number Diff line number Diff line change
@@ -1,20 +1,24 @@
package metrics

import (
"bytes"
"github.com/prometheus/client_golang/prometheus"
promtestutil "github.com/prometheus/client_golang/prometheus/testutil"
clientmodel "github.com/prometheus/client_model/go"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"net/http"
"strings"
"testing"
"time"
)

func TestHistogramVec(t *testing.T) {
// given
reg := prometheus.NewRegistry()
m := newHistogramVec("test_histogram_vec", "test histogram description", "status_code", "kube_verb")
getSuccess, getFailure, listSuccess, listFailure := getExpectedLabelPairs()
RegisterCustomMetrics()
reg.MustRegister(m)

// when
m.WithLabelValues("200", "get").Observe((5 * time.Second).Seconds())
Expand All @@ -28,7 +32,7 @@ func TestHistogramVec(t *testing.T) {
err := promtestutil.CollectAndCompare(m, strings.NewReader(expectedResponseMetadata+expectedResponse), "sandbox_test_histogram_vec")
require.NoError(t, err)

g, er := Reg.Gather()
g, er := reg.Gather()
require.NoError(t, er)
require.Equal(t, 1, len(g))
require.Equal(t, "sandbox_test_histogram_vec", g[0].GetName())
Expand All @@ -51,17 +55,33 @@ func TestHistogramVec(t *testing.T) {
require.Equal(t, 2, len(g[0].Metric[3].Label))
compareLabelPairValues(t, listFailure, g[0].GetMetric()[3].GetLabel())
require.Equal(t, uint64(2), *g[0].Metric[3].Histogram.SampleCount)

}

func TestRegisterCustomMetrics(t *testing.T) {
// when
RegisterCustomMetrics()
func TestMetricsServer(t *testing.T) {
reg := prometheus.NewRegistry()
testMetrics := NewProxyMetrics(reg)
server := testMetrics.StartMetricsServer()
require.NotNil(t, server)
defer func() {
_ = server.Close()
}()

req, err := http.NewRequest("GET", "http://localhost:8082/metrics", nil)
require.NoError(t, err)
require.NotNil(t, req)
resp, err := http.DefaultClient.Do(req)
require.NoError(t, err)
require.NotNil(t, resp)
require.Equal(t, http.StatusOK, resp.StatusCode)
require.Equal(t, "text/plain; version=0.0.4; charset=utf-8", resp.Header.Get("Content-Type"))
// compare the body of the response as well
defer resp.Body.Close()
buf := new(bytes.Buffer)
_, err = buf.ReadFrom(resp.Body)
require.NoError(t, err)
assert.Equal(t, expectedServerBlankResponse, buf.String())

// then
// verify all metrics were registered successfully
for _, m := range allHistogramVecs {
assert.True(t, Reg.Unregister(m))
}
}

var expectedResponseMetadata = `
Expand Down Expand Up @@ -109,6 +129,11 @@ var expectedResponse = `
sandbox_test_histogram_vec_sum{kube_verb="list",status_code="500"} 3.001
sandbox_test_histogram_vec_count{kube_verb="list",status_code="500"} 2
`
var expectedServerBlankResponse = `# HELP promhttp_metric_handler_errors_total Total number of internal errors encountered by the promhttp metric handler.
# TYPE promhttp_metric_handler_errors_total counter
promhttp_metric_handler_errors_total{cause="encoding"} 0
promhttp_metric_handler_errors_total{cause="gathering"} 0
`

func compareLabelPairValues(t *testing.T, expected []clientmodel.LabelPair, labelPairs []*clientmodel.LabelPair) {
for i := range labelPairs {
Expand Down
20 changes: 0 additions & 20 deletions pkg/proxy/handlers/metrics.go

This file was deleted.

37 changes: 0 additions & 37 deletions pkg/proxy/handlers/metrics_test.go

This file was deleted.

14 changes: 8 additions & 6 deletions pkg/proxy/handlers/spacelister.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,12 +40,14 @@
type SpaceLister struct {
GetSignupFunc func(ctx *gin.Context, userID, username string, checkUserSignupCompleted bool) (*signup.Signup, error)
GetInformerServiceFunc func() service.InformerService
ProxyMetrics *metrics.ProxyMetrics
}

func NewSpaceLister(app application.Application) *SpaceLister {
func NewSpaceLister(app application.Application, proxyMetrics *metrics.ProxyMetrics) *SpaceLister {

Check warning on line 46 in pkg/proxy/handlers/spacelister.go

View check run for this annotation

Codecov / codecov/patch

pkg/proxy/handlers/spacelister.go#L46

Added line #L46 was not covered by tests
return &SpaceLister{
GetSignupFunc: app.SignupService().GetSignupFromInformer,
GetInformerServiceFunc: app.InformerService,
ProxyMetrics: proxyMetrics,

Check warning on line 50 in pkg/proxy/handlers/spacelister.go

View check run for this annotation

Codecov / codecov/patch

pkg/proxy/handlers/spacelister.go#L50

Added line #L50 was not covered by tests
}
}

Expand All @@ -54,10 +56,10 @@
requestReceivedTime := ctx.Get(context.RequestReceivedTime).(time.Time)
workspaces, err := s.ListUserWorkspaces(ctx)
if err != nil {
metrics.RegServWorkspaceHistogramVec.WithLabelValues(fmt.Sprintf("%d", http.StatusInternalServerError), metrics.MetricsLabelVerbList).Observe(time.Since(requestReceivedTime).Seconds()) // using list as the default value for verb to minimize label combinations for prometheus to process
s.ProxyMetrics.RegServWorkspaceHistogramVec.WithLabelValues(fmt.Sprintf("%d", http.StatusInternalServerError), metrics.MetricsLabelVerbList).Observe(time.Since(requestReceivedTime).Seconds()) // using list as the default value for verb to minimize label combinations for prometheus to process
return errorResponse(ctx, apierrors.NewInternalError(err))
}
metrics.RegServWorkspaceHistogramVec.WithLabelValues(fmt.Sprintf("%d", http.StatusOK), metrics.MetricsLabelVerbList).Observe(time.Since(requestReceivedTime).Seconds())
s.ProxyMetrics.RegServWorkspaceHistogramVec.WithLabelValues(fmt.Sprintf("%d", http.StatusOK), metrics.MetricsLabelVerbList).Observe(time.Since(requestReceivedTime).Seconds())
return listWorkspaceResponse(ctx, workspaces)
}

Expand All @@ -66,16 +68,16 @@
requestReceivedTime := ctx.Get(context.RequestReceivedTime).(time.Time)
workspace, err := s.GetUserWorkspace(ctx)
if err != nil {
metrics.RegServWorkspaceHistogramVec.WithLabelValues(fmt.Sprintf("%d", http.StatusInternalServerError), metrics.MetricsLabelVerbGet).Observe(time.Since(requestReceivedTime).Seconds()) // using list as the default value for verb to minimize label combinations for prometheus to process
s.ProxyMetrics.RegServWorkspaceHistogramVec.WithLabelValues(fmt.Sprintf("%d", http.StatusInternalServerError), metrics.MetricsLabelVerbGet).Observe(time.Since(requestReceivedTime).Seconds()) // using list as the default value for verb to minimize label combinations for prometheus to process
return errorResponse(ctx, apierrors.NewInternalError(err))
}
if workspace == nil {
// not found
metrics.RegServWorkspaceHistogramVec.WithLabelValues(fmt.Sprintf("%d", http.StatusNotFound), metrics.MetricsLabelVerbGet).Observe(time.Since(requestReceivedTime).Seconds())
s.ProxyMetrics.RegServWorkspaceHistogramVec.WithLabelValues(fmt.Sprintf("%d", http.StatusNotFound), metrics.MetricsLabelVerbGet).Observe(time.Since(requestReceivedTime).Seconds())
r := schema.GroupResource{Group: "toolchain.dev.openshift.com", Resource: "workspaces"}
return errorResponse(ctx, apierrors.NewNotFound(r, ctx.Param("workspace")))
}
metrics.RegServWorkspaceHistogramVec.WithLabelValues(fmt.Sprintf("%d", http.StatusOK), metrics.MetricsLabelVerbGet).Observe(time.Since(requestReceivedTime).Seconds())
s.ProxyMetrics.RegServWorkspaceHistogramVec.WithLabelValues(fmt.Sprintf("%d", http.StatusOK), metrics.MetricsLabelVerbGet).Observe(time.Since(requestReceivedTime).Seconds())
return getWorkspaceResponse(ctx, workspace)
}

Expand Down
Loading