Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify compliant metric SDK specification implementation: MeterProvider/Aggregation #3646

Closed
2 tasks done
Tracked by #3674
MrAlias opened this issue Feb 3, 2023 · 19 comments
Closed
2 tasks done
Tracked by #3674
Assignees
Labels
area:metrics Part of OpenTelemetry Metrics pkg:SDK Related to an SDK package

Comments

@MrAlias
Copy link
Contributor

MrAlias commented Feb 3, 2023

  • Identify all the normative requirements, recommendations, and options the specification defines as comments to this issue
  • Ensure the current metric SDK implementation is compliant with these normative requirements, recommendations, and options in those comments.
@MrAlias MrAlias added pkg:SDK Related to an SDK package area:metrics Part of OpenTelemetry Metrics labels Feb 3, 2023
@MrAlias MrAlias self-assigned this Jun 1, 2023
@MrAlias
Copy link
Contributor Author

MrAlias commented Jun 1, 2023

Note: the term aggregation is used instead of aggregator. It is recommended that implementors reserve the "aggregator" term for the future when the SDK allows custom aggregation implementations.

There is no exported object or package with the name Aggregator.

@MrAlias
Copy link
Contributor Author

MrAlias commented Jun 1, 2023

Note: Implementors MAY choose the best idiomatic practice for their language to represent the semantic of an Aggregation and optional configuration parameters.

Not sure how much value this statement has, but for what its worth we do chose as best as possible idiomatic solutions here 😆

@MrAlias
Copy link
Contributor Author

MrAlias commented Jun 1, 2023

The SDK MUST provide the following Aggregation to support the
Metric Points in the
Metrics Data Model.

// Drop is an aggregation that drops all recorded data.
type Drop struct{} // Drop has no parameters.

// Default is an aggregation that uses the default instrument kind selection
// mapping to select another aggregation. A metric reader can be configured to
// make an aggregation selection based on instrument kind that differs from
// the default. This aggregation ensures the default is used.
//
// See the "go.opentelemetry.io/otel/sdk/metric".DefaultAggregationSelector
// for information about the default instrument kind selection mapping.
type Default struct{} // Default has no parameters.

// Sum is an aggregation that summarizes a set of measurements as their
// arithmetic sum.
type Sum struct{} // Sum has no parameters.

// LastValue is an aggregation that summarizes a set of measurements as the
// last one made.
type LastValue struct{} // LastValue has no parameters.

// ExplicitBucketHistogram is an aggregation that summarizes a set of
// measurements as an histogram with explicitly defined buckets.
type ExplicitBucketHistogram struct {
// Boundaries are the increasing bucket boundary values. Boundary values
// define bucket upper bounds. Buckets are exclusive of their lower
// boundary and inclusive of their upper bound (except at positive
// infinity). A measurement is defined to fall into the greatest-numbered
// bucket with a boundary that is greater than or equal to the
// measurement. As an example, boundaries defined as:
//
// []float64{0, 5, 10, 25, 50, 75, 100, 250, 500, 1000}
//
// Will define these buckets:
//
// (-∞, 0], (0, 5.0], (5.0, 10.0], (10.0, 25.0], (25.0, 50.0],
// (50.0, 75.0], (75.0, 100.0], (100.0, 250.0], (250.0, 500.0],
// (500.0, 1000.0], (1000.0, +∞)
Boundaries []float64
// NoMinMax indicates whether to not record the min and max of the
// distribution. By default, these extrema are recorded.
//
// Recording these extrema for cumulative data is expected to have little
// value, they will represent the entire life of the instrument instead of
// just the current collection cycle. It is recommended to set this to true
// for that type of data to avoid computing the low-value extrema.
NoMinMax bool
}

@MrAlias
Copy link
Contributor Author

MrAlias commented Jun 1, 2023

The SDK SHOULD provide the following Aggregation:

We do not support this. It is tracked here: #2966

Given this is a recommendation, It should be completed before the stable release of the metric SDK.

@MrAlias
Copy link
Contributor Author

MrAlias commented Jun 1, 2023

Histogram Aggregations

...

  • Arithmetic sum of Measurement values in population. This SHOULD NOT be collected when used with
    instruments that record negative measurements (e.g. UpDownCounter or ObservableGauge).

Our instrument aggregation selection would be the place this is done:

case aggregation.ExplicitBucketHistogram:
switch temporality {
case metricdata.CumulativeTemporality:
return internal.NewCumulativeHistogram[N](a), nil
case metricdata.DeltaTemporality:
return internal.NewDeltaHistogram[N](a), nil

It does not look to be be conforming to this recommendation. An issue is needed to track this non-compliance.

@MrAlias
Copy link
Contributor Author

MrAlias commented Jun 1, 2023

It does not look to be be conforming to this recommendation. An issue is needed to track this non-compliance.

#4161

@MrAlias
Copy link
Contributor Author

MrAlias commented Jun 1, 2023

Implementations are REQUIRED to accept the entire normal range of IEEE floating point values (i.e., all values except for +Inf, -Inf and NaN values).

Implementations SHOULD NOT incorporate non-normal values (i.e., +Inf, -Inf, and NaNs) into the sum, min, and max fields, because these values do not map into a valid bucket.

Implementations MAY round subnormal values away from zero to the nearest normal value.

Exponential histograms are not yet supported so these statements do not currently apply.

@MrAlias
Copy link
Contributor Author

MrAlias commented Jun 1, 2023

The implementation MUST maintain reasonable minimum and maximum scale parameters that the automatic scale parameter will not exceed. The maximum scale is defined by the MaxScale configuration parameter.

Exponential histograms are not yet supported so this statement does not currently apply.

@MrAlias
Copy link
Contributor Author

MrAlias commented Jun 1, 2023

When the histogram contains not more than one value in either of the positive or negative ranges, the implementation SHOULD use the maximum scale.

Exponential histograms are not yet supported so this statement does not currently apply.

@MrAlias
Copy link
Contributor Author

MrAlias commented Jun 1, 2023

Implementations SHOULD adjust the histogram scale as necessary to maintain the best resolution possible, within the constraint of maximum size (max number of buckets). Best resolution (highest scale) is achieved when the number of positive or negative range buckets exceeds half the maximum size, such that increasing scale by one would not be possible given the size constraint.

Exponential histograms are not yet supported so these statements do not currently apply.

@MrAlias
Copy link
Contributor Author

MrAlias commented Jun 1, 2023

@MrAlias
Copy link
Contributor Author

MrAlias commented Aug 14, 2023

Base2 exponential histogram support has been added. That part of the specification needs to be re-evaluated.

@MrAlias
Copy link
Contributor Author

MrAlias commented Aug 14, 2023

Implementations are REQUIRED to accept the entire normal range of IEEE floating point values (i.e., all values except for +Inf, -Inf and NaN values).

We support all normal and subnormal values. Where the subnormal values are coerced to a normal value which is covered by:

Implementations MAY round subnormal values away from zero to the nearest normal value.

@MrAlias
Copy link
Contributor Author

MrAlias commented Aug 14, 2023

Implementations SHOULD NOT incorporate non-normal values (i.e., +Inf, -Inf, and NaNs) into the sum, min, and max fields, because these values do not map into a valid bucket.

This is not compliant.

Modifying the TestExpoHistogramDataPointRecord/*MinMaxSum tests:

--- a/sdk/metric/internal/aggregate/exponential_histogram_test.go
+++ b/sdk/metric/internal/aggregate/exponential_histogram_test.go
@@ -177,6 +177,18 @@ func testExpoHistogramDataPointRecordMinMaxSum[N int64 | float64](t *testing.T)
                        values:   []N{2, 4, 1},
                        expected: expectedMinMaxSum[N]{1, 4, 7, 3},
                },
+               {
+                       values:   []N{2, 4, 1, N(math.Inf(1))},
+                       expected: expectedMinMaxSum[N]{1, 4, 7, 4},
+               },
+               {
+                       values:   []N{2, 4, 1, N(math.Inf(-1))},
+                       expected: expectedMinMaxSum[N]{1, 4, 7, 4},
+               },
+               {
+                       values:   []N{2, 4, 1, N(math.NaN())},
+                       expected: expectedMinMaxSum[N]{1, 4, 7, 4},
+               },
                {
                        values:   []N{4, 4, 4, 2, 16, 1},
                        expected: expectedMinMaxSum[N]{1, 16, 31, 6},

Results in the following:

$ go test ./...
--- FAIL: TestExpoHistogramDataPointRecord (0.00s)
    --- FAIL: TestExpoHistogramDataPointRecord/float64_MinMaxSum (0.00s)
        --- FAIL: TestExpoHistogramDataPointRecord/float64_MinMaxSum/[2_4_1_+Inf] (0.00s)
            exponential_histogram_test.go:208:
                	Error Trace:	/home/tyler/go/src/go.opentelemetry.io/otel/sdk/metric/internal/aggregate/exponential_histogram_test.go:208
                	Error:      	Not equal:
                	            	expected: 4
                	            	actual  : +Inf
                	Test:       	TestExpoHistogramDataPointRecord/float64_MinMaxSum/[2_4_1_+Inf]
            exponential_histogram_test.go:210:
                	Error Trace:	/home/tyler/go/src/go.opentelemetry.io/otel/sdk/metric/internal/aggregate/exponential_histogram_test.go:210
                	Error:      	Not equal:
                	            	expected: 7
                	            	actual  : +Inf
                	Test:       	TestExpoHistogramDataPointRecord/float64_MinMaxSum/[2_4_1_+Inf]
        --- FAIL: TestExpoHistogramDataPointRecord/float64_MinMaxSum/[2_4_1_-Inf] (0.00s)
            exponential_histogram_test.go:209:
                	Error Trace:	/home/tyler/go/src/go.opentelemetry.io/otel/sdk/metric/internal/aggregate/exponential_histogram_test.go:209
                	Error:      	Not equal:
                	            	expected: 1
                	            	actual  : -Inf
                	Test:       	TestExpoHistogramDataPointRecord/float64_MinMaxSum/[2_4_1_-Inf]
            exponential_histogram_test.go:210:
                	Error Trace:	/home/tyler/go/src/go.opentelemetry.io/otel/sdk/metric/internal/aggregate/exponential_histogram_test.go:210
                	Error:      	Not equal:
                	            	expected: 7
                	            	actual  : -Inf
                	Test:       	TestExpoHistogramDataPointRecord/float64_MinMaxSum/[2_4_1_-Inf]
        --- FAIL: TestExpoHistogramDataPointRecord/float64_MinMaxSum/[2_4_1_NaN] (0.00s)
            exponential_histogram_test.go:210:
                	Error Trace:	/home/tyler/go/src/go.opentelemetry.io/otel/sdk/metric/internal/aggregate/exponential_histogram_test.go:210
                	Error:      	Not equal:
                	            	expected: 7
                	            	actual  : NaN
                	Test:       	TestExpoHistogramDataPointRecord/float64_MinMaxSum/[2_4_1_NaN]
    --- FAIL: TestExpoHistogramDataPointRecord/int64_MinMaxSum (0.00s)
        --- FAIL: TestExpoHistogramDataPointRecord/int64_MinMaxSum/[2_4_1_-9223372036854775808] (0.00s)
            exponential_histogram_test.go:209:
                	Error Trace:	/home/tyler/go/src/go.opentelemetry.io/otel/sdk/metric/internal/aggregate/exponential_histogram_test.go:209
                	Error:      	Not equal:
                	            	expected: 1
                	            	actual  : -9223372036854775808
                	Test:       	TestExpoHistogramDataPointRecord/int64_MinMaxSum/[2_4_1_-9223372036854775808]
            exponential_histogram_test.go:210:
                	Error Trace:	/home/tyler/go/src/go.opentelemetry.io/otel/sdk/metric/internal/aggregate/exponential_histogram_test.go:210
                	Error:      	Not equal:
                	            	expected: 7
                	            	actual  : -9223372036854775801
                	Test:       	TestExpoHistogramDataPointRecord/int64_MinMaxSum/[2_4_1_-9223372036854775808]
        --- FAIL: TestExpoHistogramDataPointRecord/int64_MinMaxSum/[2_4_1_-9223372036854775808]#01 (0.00s)
            exponential_histogram_test.go:209:
                	Error Trace:	/home/tyler/go/src/go.opentelemetry.io/otel/sdk/metric/internal/aggregate/exponential_histogram_test.go:209
                	Error:      	Not equal:
                	            	expected: 1
                	            	actual  : -9223372036854775808
                	Test:       	TestExpoHistogramDataPointRecord/int64_MinMaxSum/[2_4_1_-9223372036854775808]#01
            exponential_histogram_test.go:210:
                	Error Trace:	/home/tyler/go/src/go.opentelemetry.io/otel/sdk/metric/internal/aggregate/exponential_histogram_test.go:210
                	Error:      	Not equal:
                	            	expected: 7
                	            	actual  : -9223372036854775801
                	Test:       	TestExpoHistogramDataPointRecord/int64_MinMaxSum/[2_4_1_-9223372036854775808]#01
        --- FAIL: TestExpoHistogramDataPointRecord/int64_MinMaxSum/[2_4_1_-9223372036854775808]#02 (0.00s)
            exponential_histogram_test.go:209:
                	Error Trace:	/home/tyler/go/src/go.opentelemetry.io/otel/sdk/metric/internal/aggregate/exponential_histogram_test.go:209
                	Error:      	Not equal:
                	            	expected: 1
                	            	actual  : -9223372036854775808
                	Test:       	TestExpoHistogramDataPointRecord/int64_MinMaxSum/[2_4_1_-9223372036854775808]#02
            exponential_histogram_test.go:210:
                	Error Trace:	/home/tyler/go/src/go.opentelemetry.io/otel/sdk/metric/internal/aggregate/exponential_histogram_test.go:210
                	Error:      	Not equal:
                	            	expected: 7
                	            	actual  : -9223372036854775801
                	Test:       	TestExpoHistogramDataPointRecord/int64_MinMaxSum/[2_4_1_-9223372036854775808]#02
FAIL
FAIL	go.opentelemetry.io/otel/sdk/metric/internal/aggregate	0.008s
FAIL

@MrAlias
Copy link
Contributor Author

MrAlias commented Aug 14, 2023

The implementation MUST maintain reasonable minimum and maximum scale parameters that the automatic scale parameter will not exceed. The maximum scale is defined by the MaxScale configuration parameter.

We rescale using this delta calculation:

// scaleChange returns the magnitude of the scale change needed to fit bin in the bucket.
func scaleChange(bin, startBin, length, maxSize int) int {
if length == 0 {
// No need to rescale if there are no buckets.
return 0
}
low := startBin
high := bin
if startBin >= bin {
low = bin
high = startBin + length - 1
}
count := 0
for high-low >= maxSize {
low = low >> 1
high = high >> 1
count++
if count > expoMaxScale-expoMinScale {
return count
}
}
return count
}

That results in a scale less than -10 it caps the scale to be -10:

https://github.com/open-telemetry/opentelemetry-go/blob/3904523917fb412eae593ea07833bbede14d3ba7/sdk/metric/internal/aggregate/exponential_histogram.go#L147C27-L152

scaleChange only returns a positive value, meaning that give the scale starts at the configured MaxScale it can only go down:

Therefore, the maximum scale is the reasonable configuration MaxScale.

@MrAlias
Copy link
Contributor Author

MrAlias commented Aug 14, 2023

When the histogram contains not more than one value in either of the positive or negative ranges, the implementation SHOULD use the maximum scale.

The scaleChange function specifically handles the case for the first measurement (length is 0):

if length == 0 {
// No need to rescale if there are no buckets.
return 0
}

@MrAlias
Copy link
Contributor Author

MrAlias commented Aug 14, 2023

Implementations SHOULD adjust the histogram scale as necessary to maintain the best resolution possible, within the constraint of maximum size (max number of buckets). Best resolution (highest scale) is achieved when the number of positive or negative range buckets exceeds half the maximum size, such that increasing scale by one would not be possible given the size constraint.

That is what our implementation does prior to recording a measurement:

// scaleChange returns the magnitude of the scale change needed to fit bin in the bucket.
func scaleChange(bin, startBin, length, maxSize int) int {
if length == 0 {
// No need to rescale if there are no buckets.
return 0
}
low := startBin
high := bin
if startBin >= bin {
low = bin
high = startBin + length - 1
}
count := 0
for high-low >= maxSize {
low = low >> 1
high = high >> 1
count++
if count > expoMaxScale-expoMinScale {
return count
}
}
return count
}

// If the new bin would make the counts larger than maxScale, we need to
// downscale current measurements.
if scaleDelta := scaleChange(bin, bucket.startBin, len(bucket.counts), p.maxSize); scaleDelta > 0 {
if p.scale-scaleDelta < expoMinScale {
// With a scale of -10 there is only two buckets for the whole range of float64 values.
// This can only happen if there is a max size of 1.
otel.Handle(errors.New("exponential histogram scale underflow"))
return
}
//Downscale
p.scale -= scaleDelta
p.posBuckets.downscale(scaleDelta)
p.negBuckets.downscale(scaleDelta)
bin = getBin(absV, p.scale)
}

@MrAlias
Copy link
Contributor Author

MrAlias commented Aug 14, 2023

@MrAlias MrAlias added this to the v1.17.0/v0.40.0 milestone Aug 21, 2023
@MrAlias
Copy link
Contributor Author

MrAlias commented Aug 21, 2023

Done

@MrAlias MrAlias closed this as completed Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:metrics Part of OpenTelemetry Metrics pkg:SDK Related to an SDK package
Projects
No open projects
Development

No branches or pull requests

1 participant