Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: [receiver/k8scluster] Add optional k8s.container.status.waiting metric #35668

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .chloggen/crashloop.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: k8sclusterreceiver

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: "Add optional k8s.container.status.waiting metric for detecting CrashLoopBackOff containers"

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [32457]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: [user]
8 changes: 8 additions & 0 deletions receiver/k8sclusterreceiver/documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -396,6 +396,14 @@ metrics:
enabled: true
```

### k8s.container.status.waiting

Whether container is in waiting state. (0 for now, 1 for yes)

| Unit | Metric Type | Value Type |
| ---- | ----------- | ---------- |
| | Gauge | Int |

### k8s.node.condition

The condition of a particular Node.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,12 @@ func RecordSpecMetrics(logger *zap.Logger, mb *imetadata.MetricsBuilder, c corev
imageStr = cs.Image
mb.RecordK8sContainerRestartsDataPoint(ts, int64(cs.RestartCount))
mb.RecordK8sContainerReadyDataPoint(ts, boolToInt64(cs.Ready))
mb.RecordK8sContainerStatusWaitingDataPoint(ts, boolToInt64(cs.State.Waiting != nil))

if cs.LastTerminationState.Terminated != nil {
rb.SetK8sContainerStatusLastTerminatedReason(cs.LastTerminationState.Terminated.Reason)
}

break
}
}
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ all_set:
enabled: true
k8s.container.restarts:
enabled: true
k8s.container.status.waiting:
enabled: true
k8s.container.storage_limit:
enabled: true
k8s.container.storage_request:
Expand Down Expand Up @@ -182,6 +184,8 @@ none_set:
enabled: false
k8s.container.restarts:
enabled: false
k8s.container.status.waiting:
enabled: false
k8s.container.storage_limit:
enabled: false
k8s.container.storage_request:
Expand Down
30 changes: 30 additions & 0 deletions receiver/k8sclusterreceiver/internal/pod/pods_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ func TestPodStatusReasonAndContainerMetricsReportCPUMetrics(t *testing.T) {

mbc := metadata.DefaultMetricsBuilderConfig()
mbc.Metrics.K8sPodStatusReason.Enabled = true
mbc.Metrics.K8sContainerStatusWaiting.Enabled = true
mbc.ResourceAttributes.K8sPodQosClass.Enabled = true
mbc.ResourceAttributes.K8sContainerStatusLastTerminatedReason.Enabled = true
ts := pcommon.Timestamp(time.Now().UnixNano())
Expand All @@ -87,6 +88,35 @@ func TestPodStatusReasonAndContainerMetricsReportCPUMetrics(t *testing.T) {
)
}

func TestPodStatusWaitingAndContainerMetricsReportCPUMetrics(t *testing.T) {
pod := testutils.NewPodWithContainer(
"1",
testutils.NewPodSpecWithContainer("container-name"),
testutils.NewCrashLoopPodStatusWithContainer("container-name", containerIDWithPreifx("container-id")),
)

mbc := metadata.DefaultMetricsBuilderConfig()
mbc.Metrics.K8sPodStatusReason.Enabled = true
mbc.Metrics.K8sContainerStatusWaiting.Enabled = true
mbc.ResourceAttributes.K8sPodQosClass.Enabled = true
mbc.ResourceAttributes.K8sContainerStatusLastTerminatedReason.Enabled = true
ts := pcommon.Timestamp(time.Now().UnixNano())
mb := metadata.NewMetricsBuilder(mbc, receivertest.NewNopSettings())
RecordMetrics(zap.NewNop(), mb, pod, ts)
m := mb.Emit()

expected, err := golden.ReadMetrics(filepath.Join("testdata", "expected_crashloop.yaml"))
require.NoError(t, err)
require.NoError(t, pmetrictest.CompareMetrics(expected, m,
pmetrictest.IgnoreTimestamp(),
pmetrictest.IgnoreStartTimestamp(),
pmetrictest.IgnoreResourceMetricsOrder(),
pmetrictest.IgnoreMetricsOrder(),
pmetrictest.IgnoreScopeMetricsOrder(),
),
)
}

var containerIDWithPreifx = func(containerID string) string {
return "docker://" + containerID
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
resourceMetrics:
- resource:
attributes:
- key: k8s.namespace.name
value:
stringValue: test-namespace
- key: k8s.node.name
value:
stringValue: test-node
- key: k8s.pod.name
value:
stringValue: test-pod-1
- key: k8s.pod.qos_class
value:
stringValue: BestEffort
- key: k8s.pod.uid
value:
stringValue: test-pod-1-uid
schemaUrl: https://opentelemetry.io/schemas/1.18.0
scopeMetrics:
- metrics:
- description: Current phase of the pod (1 - Pending, 2 - Running, 3 - Succeeded, 4 - Failed, 5 - Unknown)
gauge:
dataPoints:
- asInt: "2"
name: k8s.pod.phase
unit: ""
- description: Current status reason of the pod (1 - Evicted, 2 - NodeAffinity, 3 - NodeLost, 4 - Shutdown, 5 - UnexpectedAdmissionError, 6 - Unknown)
gauge:
dataPoints:
- asInt: "6"
name: k8s.pod.status_reason
unit: ""
scope:
name: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/k8sclusterreceiver
version: latest
- resource:
attributes:
- key: container.id
value:
stringValue: container-id
- key: container.image.name
value:
stringValue: container-image-name
- key: container.image.tag
value:
stringValue: latest
- key: k8s.container.name
value:
stringValue: container-name
- key: k8s.container.status.last_terminated_reason
value:
stringValue: Error
- key: k8s.namespace.name
value:
stringValue: test-namespace
- key: k8s.node.name
value:
stringValue: test-node
- key: k8s.pod.name
value:
stringValue: test-pod-1
- key: k8s.pod.uid
value:
stringValue: test-pod-1-uid
schemaUrl: https://opentelemetry.io/schemas/1.18.0
scopeMetrics:
- metrics:
- description: How many times the container has restarted in the recent past. This value is pulled directly from the K8s API and the value can go indefinitely high and be reset to 0 at any time depending on how your kubelet is configured to prune dead containers. It is best to not depend too much on the exact value but rather look at it as either == 0, in which case you can conclude there were no restarts in the recent past, or > 0, in which case you can conclude there were restarts in the recent past, and not try and analyze the value beyond that.
gauge:
dataPoints:
- asInt: "3"
name: k8s.container.restarts
unit: "{restart}"
- description: Whether a container has passed its readiness probe (0 for no, 1 for yes)
gauge:
dataPoints:
- asInt: "1"
name: k8s.container.ready
unit: ""
- description: Whether container is in waiting state. (0 for now, 1 for yes)
gauge:
dataPoints:
- asInt: "1"
name: k8s.container.status.waiting
unit: ""
- description: Resource requested for the container. See https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#resourcerequirements-v1-core for details
gauge:
dataPoints:
- asDouble: 10
name: k8s.container.cpu_request
unit: "{cpu}"
- description: Maximum resource limit set for the container. See https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#resourcerequirements-v1-core for details
gauge:
dataPoints:
- asDouble: 20
name: k8s.container.cpu_limit
unit: "{cpu}"
scope:
name: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/k8sclusterreceiver
version: latest
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,12 @@ resourceMetrics:
- asInt: "1"
name: k8s.container.ready
unit: ""
- description: Whether container is in waiting state. (0 for now, 1 for yes)
gauge:
dataPoints:
- asInt: "0"
name: k8s.container.status.waiting
unit: ""
- description: Resource requested for the container. See https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#resourcerequirements-v1-core for details
gauge:
dataPoints:
Expand Down
Loading