Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logs are duplicated when getting label __meta_kubernetes_pod_ready #2211

Open
antoineozenne opened this issue Dec 3, 2024 · 3 comments
Open
Labels
bug Something isn't working

Comments

@antoineozenne
Copy link

What's wrong?

I don't know if we can call this a bug, or if the documentation should be updated to explain this specific behavior.

When using discovery.relabel to get the label __meta_kubernetes_pod_ready (in a pod_ready label), there is a strange behavior when a pod is starting. The first lines of logs are scraped 2 times, with the same labels and values except for the label pod_ready (firstly false then true).

Is it expected? The log of alloy seems to indicate an error. Below, see the steps and logs.

Steps to reproduce

  1. Install the alloy Helm chart with the following values:
alloy:
  configMap:
    content: |
      remote.kubernetes.secret "my_alloy_secret" {
        namespace = {{ .Release.Namespace | quote }}
        name = "my-alloy"
      }

      loki.write "default" {
        endpoint {
          url = "https://XXX.grafana.net/loki/api/v1/push"

          basic_auth {
            username = "XXX"
            password = remote.kubernetes.secret.my_alloy_secret.data["loki-basic-auth-password"]
          }
        }
      }

      prometheus.remote_write "default" {
        endpoint {
          url = "https://XXX.grafana.net/api/prom/push"

          basic_auth {
            username = "XXX"
            password = remote.kubernetes.secret.my_alloy_secret.data["prometheus-basic-auth-password"]
          }
        }
      }

      discovery.kubernetes "pods" {
      	role = "pod"

        namespaces {
          names = ["test-foo"]
        }

        selectors {
          role = "pod"
          label = "app.kubernetes.io/name=hello-world)"
        }
      }

      discovery.kubernetes "nodes" {
      	role = "node"
      }

      discovery.kubernetes "services" {
      	role = "service"
      }

      discovery.kubernetes "endpoints" {
      	role = "endpoints"
      }

      discovery.kubernetes "endpointslices" {
      	role = "endpointslice"
      }

      discovery.kubernetes "ingresses" {
      	role = "ingress"
      }

      discovery.relabel "pod_logs" {
        targets = discovery.kubernetes.pods.targets
        rule {
          regex = "__meta_kubernetes_(namespace|pod_name|pod_label_app_kubernetes_io_name|pod_label_app_kubernetes_io_instance|pod_container_name|pod_container_image|pod_ready|pod_phase)"
          action = "labelmap"
          replacement = "$1"
        }
      }

      discovery.relabel "pod_metrics" {
        targets = discovery.kubernetes.pods.targets
        rule {
          regex = "__meta_kubernetes_(namespace|pod_name|pod_label_app_kubernetes_io_name|pod_label_app_kubernetes_io_instance|pod_container_name|pod_container_image|pod_ready|pod_phase)"
          action = "labelmap"
          replacement = "$1"
        }
        rule {
          source_labels = ["__meta_kubernetes_pod_annotation_prometheus_io_path"]
          target_label = "__metrics_path__"
          regex = "(.+)"
          action = "replace"
        }
      }

      loki.source.kubernetes "pod_logs" {
        targets = discovery.relabel.pod_logs.output
        forward_to = [loki.process.pod_logs.receiver]
      }

      loki.process "pod_logs" {
        forward_to = [loki.write.default.receiver]
        stage.static_labels {
          values = {
            cluster = "mycluster",
          }
        }
      }

      prometheus.scrape "pod_metrics" {
        targets = discovery.relabel.pod_metrics.output
        forward_to = [prometheus.remote_write.default.receiver]
      }
  1. Install the chart hello-world in the namespace test-foo (release name test-foo).
  2. See the alloy logs:
ts=2024-12-03T15:55:41.633082769Z level=info msg="tailer running" target=test-foo/test-foo-hello-world-94cf9bbb-7bplw:hello-world component_path=/ component_id=loki.source.kubernetes.pod_logs
ts=2024-12-03T15:55:41.648511559Z level=info msg="opened log stream" target=test-foo/test-foo-hello-world-94cf9bbb-7bplw:hello-world component_path=/ component_id=loki.source.kubernetes.pod_logs "start time"=1970-01-01T00:00:00.000Z
ts=2024-12-03T15:55:46.632422555Z level=info msg="tailer running" target=test-foo/test-foo-hello-world-94cf9bbb-7bplw:hello-world component_path=/ component_id=loki.source.kubernetes.pod_logs
ts=2024-12-03T15:55:46.632459929Z level=warn msg="could not determine if container terminated; will retry tailing" target=test-foo/test-foo-hello-world-94cf9bbb-7bplw:hello-world component_path=/ component_id=loki.source.kubernetes.pod_logs err="client rate limiter Wait returned an error: context canceled"
ts=2024-12-03T15:55:46.632484785Z level=info msg="tailer exited" target=test-foo/test-foo-hello-world-94cf9bbb-7bplw:hello-world component_path=/ component_id=loki.source.kubernetes.pod_logs
ts=2024-12-03T15:55:46.63260569Z level=info msg="removing entry from positions file" component_path=/ component_id=loki.source.kubernetes.pod_logs path=cursor-test-foo/test-foo-hello-world-94cf9bbb-7bplw:hello-world:3ad43069-d0be-4570-b046-9a5eab3b4e75 labels="{instance=\"test-foo/test-foo-hello-world-94cf9bbb-7bplw:hello-world\", job=\"loki.source.kubernetes.pod_logs\", namespace=\"test-foo\", pod_container_image=\"nginx:1.16.0\", pod_container_name=\"hello-world\", pod_label_app_kubernetes_io_instance=\"test-foo\", pod_label_app_kubernetes_io_name=\"hello-world\", pod_name=\"test-foo-hello-world-94cf9bbb-7bplw\", pod_phase=\"Running\", pod_ready=\"false\"}"
ts=2024-12-03T15:55:46.646883466Z level=info msg="opened log stream" target=test-foo/test-foo-hello-world-94cf9bbb-7bplw:hello-world component_path=/ component_id=loki.source.kubernetes.pod_logs "start time"=1970-01-01T00:00:00.000Z
ts=2024-12-03T16:00:05.647336464Z level=warn msg="tailer stopped; will retry" target=test-foo/test-foo-hello-world-94cf9bbb-7bplw:hello-world component_path=/ component_id=loki.source.kubernetes.pod_logs err="http2: response body closed"
ts=2024-12-03T16:00:05.676138792Z level=info msg="opened log stream" target=test-foo/test-foo-hello-world-94cf9bbb-7bplw:hello-world component_path=/ component_id=loki.source.kubernetes.pod_logs "start time"=2024-12-03T16:00:05.647Z
  1. See the duplicated logs in Grafana (with just the pod_ready label which changes):
[
    {
        "line": "10.240.0.69 - - [03/Dec/2024:15:55:45 +0000] \"GET / HTTP/1.1\" 200 612 \"-\" \"kube-probe/1.27\" \"-\"\n",
        "timestamp": "1733241345998266862",
        "fields": {
            "cluster": "mycluster",
            "detected_level": "unknown",
            "instance": "test-foo/test-foo-hello-world-94cf9bbb-7bplw:hello-world",
            "job": "loki.source.kubernetes.pod_logs",
            "namespace": "test-foo",
            "pod_container_image": "nginx:1.16.0",
            "pod_container_name": "hello-world",
            "pod_label_app_kubernetes_io_instance": "test-foo",
            "pod_label_app_kubernetes_io_name": "hello-world",
            "pod_name": "test-foo-hello-world-94cf9bbb-7bplw",
            "pod_phase": "Running",
            "pod_ready": "true",
            "service_name": "loki.source.kubernetes.pod_logs"
        }
    },
    {
        "line": "10.240.0.69 - - [03/Dec/2024:15:55:45 +0000] \"GET / HTTP/1.1\" 200 612 \"-\" \"kube-probe/1.27\" \"-\"\n",
        "timestamp": "1733241345998266862",
        "fields": {
            "cluster": "mycluster",
            "detected_level": "unknown",
            "instance": "test-foo/test-foo-hello-world-94cf9bbb-7bplw:hello-world",
            "job": "loki.source.kubernetes.pod_logs",
            "namespace": "test-foo",
            "pod_container_image": "nginx:1.16.0",
            "pod_container_name": "hello-world",
            "pod_label_app_kubernetes_io_instance": "test-foo",
            "pod_label_app_kubernetes_io_name": "hello-world",
            "pod_name": "test-foo-hello-world-94cf9bbb-7bplw",
            "pod_phase": "Running",
            "pod_ready": "false",
            "service_name": "loki.source.kubernetes.pod_logs"
        }
    },
    {
        "line": "10.240.0.69 - - [03/Dec/2024:15:55:45 +0000] \"GET / HTTP/1.1\" 200 612 \"-\" \"kube-probe/1.27\" \"-\"\n",
        "timestamp": "1733241345998134553",
        "fields": {
            "cluster": "mycluster",
            "detected_level": "unknown",
            "instance": "test-foo/test-foo-hello-world-94cf9bbb-7bplw:hello-world",
            "job": "loki.source.kubernetes.pod_logs",
            "namespace": "test-foo",
            "pod_container_image": "nginx:1.16.0",
            "pod_container_name": "hello-world",
            "pod_label_app_kubernetes_io_instance": "test-foo",
            "pod_label_app_kubernetes_io_name": "hello-world",
            "pod_name": "test-foo-hello-world-94cf9bbb-7bplw",
            "pod_phase": "Running",
            "pod_ready": "true",
            "service_name": "loki.source.kubernetes.pod_logs"
        }
    },
    {
        "line": "10.240.0.69 - - [03/Dec/2024:15:55:45 +0000] \"GET / HTTP/1.1\" 200 612 \"-\" \"kube-probe/1.27\" \"-\"\n",
        "timestamp": "1733241345998134553",
        "fields": {
            "cluster": "mycluster",
            "detected_level": "unknown",
            "instance": "test-foo/test-foo-hello-world-94cf9bbb-7bplw:hello-world",
            "job": "loki.source.kubernetes.pod_logs",
            "namespace": "test-foo",
            "pod_container_image": "nginx:1.16.0",
            "pod_container_name": "hello-world",
            "pod_label_app_kubernetes_io_instance": "test-foo",
            "pod_label_app_kubernetes_io_name": "hello-world",
            "pod_name": "test-foo-hello-world-94cf9bbb-7bplw",
            "pod_phase": "Running",
            "pod_ready": "false",
            "service_name": "loki.source.kubernetes.pod_logs"
        }
    },
    {
        "line": "10.240.0.69 - - [03/Dec/2024:15:55:41 +0000] \"GET / HTTP/1.1\" 200 612 \"-\" \"kube-probe/1.27\" \"-\"\n",
        "timestamp": "1733241341908108141",
        "fields": {
            "cluster": "mycluster",
            "detected_level": "unknown",
            "instance": "test-foo/test-foo-hello-world-94cf9bbb-7bplw:hello-world",
            "job": "loki.source.kubernetes.pod_logs",
            "namespace": "test-foo",
            "pod_container_image": "nginx:1.16.0",
            "pod_container_name": "hello-world",
            "pod_label_app_kubernetes_io_instance": "test-foo",
            "pod_label_app_kubernetes_io_name": "hello-world",
            "pod_name": "test-foo-hello-world-94cf9bbb-7bplw",
            "pod_phase": "Running",
            "pod_ready": "true",
            "service_name": "loki.source.kubernetes.pod_logs"
        }
    },
    {
        "line": "10.240.0.69 - - [03/Dec/2024:15:55:41 +0000] \"GET / HTTP/1.1\" 200 612 \"-\" \"kube-probe/1.27\" \"-\"\n",
        "timestamp": "1733241341908108141",
        "fields": {
            "cluster": "mycluster",
            "detected_level": "unknown",
            "instance": "test-foo/test-foo-hello-world-94cf9bbb-7bplw:hello-world",
            "job": "loki.source.kubernetes.pod_logs",
            "namespace": "test-foo",
            "pod_container_image": "nginx:1.16.0",
            "pod_container_name": "hello-world",
            "pod_label_app_kubernetes_io_instance": "test-foo",
            "pod_label_app_kubernetes_io_name": "hello-world",
            "pod_name": "test-foo-hello-world-94cf9bbb-7bplw",
            "pod_phase": "Running",
            "pod_ready": "false",
            "service_name": "loki.source.kubernetes.pod_logs"
        }
    }
]

System information

No response

Software version

Grafana Alloy v1.5.0

Configuration


Logs


@antoineozenne antoineozenne added the bug Something isn't working label Dec 3, 2024
@ptodev
Copy link
Contributor

ptodev commented Dec 11, 2024

Hi! Thank you for reporting this. It might genuinely be a bug. I could try to reproduce it later with using the information you have provided. These are the things I'd check:

  • Making sure there aren't 2 Alloy instances running at the same time.
  • Checking the Alloy /metrics endpoint to see how many logs were sent.
  • Enabling debug logging.
  • Using a loki.echo component to make sure the Alloy instance really does send the logs we think it sends.

If I don't get around to replicating this, and if someone would like to check the above list of things, please feel free to do so 😊

Copy link
Contributor

This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it.
If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue.
The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!

@antoineozenne
Copy link
Author

/no-stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants