Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix event mapping implementation for statsd module #36925

Merged
merged 14 commits into from
Oct 30, 2023

Conversation

ritalwar
Copy link
Contributor

@ritalwar ritalwar commented Oct 21, 2023

  • Bug

Proposed commit message

The root cause of the problem is the implementation of the event mapping method.

The data periodically sent by Airflow is not limited to a single DAG; it can encompass information from all currently running DAGs. This collective data is gathered into a group and subsequently processed to generate an event. The issue was traced back to a for loop used to iterate through all the key-value pairs within this group to create a single event. Because the data originates from multiple DAGs, creating a single event resulted in the data from one DAG overwriting the data from another. This led to the chaotic mixing of metrics, causing the problem.

The new implementation matches the approach used in the statsd_exporter. Now, each key-value pair and its associated attributes are treated as individual events. As a result, each key-value pair is processed to create distinct events, with only the related fields present in each event. This modification effectively resolves the issue and offers greater flexibility when using controls and filters to generate visualizations for associated data.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  1. Enable the statsd module.
  2. When testing any module, such as airflow, configure the statsd mapping.
  3. Verify the data and mappings in Elasticsearch.

Related issues

Screenshots

@ritalwar ritalwar added the bug label Oct 21, 2023
@ritalwar ritalwar requested a review from a team as a code owner October 21, 2023 17:35
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 21, 2023
@ritalwar ritalwar self-assigned this Oct 21, 2023
@ritalwar ritalwar added the Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team label Oct 21, 2023
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Oct 21, 2023
@mergify
Copy link
Contributor

mergify bot commented Oct 21, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @ritalwar? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@shmsr shmsr changed the title Update eventmapping implementation for statsd module Update event mapping implementation for statsd module Oct 21, 2023
@elasticmachine
Copy link
Collaborator

elasticmachine commented Oct 21, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 56 min 6 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@ritalwar ritalwar changed the title Update event mapping implementation for statsd module Fix event mapping implementation for statsd module Oct 21, 2023
@devamanv devamanv requested a review from a team October 22, 2023 19:18
@lalit-satapathy
Copy link
Contributor

Hi @ritalwar,
Can we summarise the the issue/problem and solution being done here (also an old document vs new document will help)?

@ritalwar
Copy link
Contributor Author

Issue details:
Using statsd.mappings to link fields with labels and attributes, it was noticed that fields containing multiple <label_placeholders> were causing jumbled metrics data in Elasticsearch. This led to incorrect values being included in the event JSON. This issue arose while utilizing the Metricbeat statsd module to send metrics from Apache Airflow to statsd.

Root Cause:
During the code debugging process, we pinpointed the root cause of this problem to be the implementation of the event mapping method.

Here are the specifics: The data periodically sent by Airflow is not limited to a single DAG; it can encompass information from all currently running DAGs. This collective data is gathered into a group and subsequently processed to generate an event. The issue was traced back to a for loop used to iterate through all the key-value pairs within this group to create a single event. Because the data originates from multiple DAGs, creating a single event resulted in the data from one DAG overwriting the data from another. This led to the chaotic mixing of metrics, causing the problem at hand.

Solution:

We've updated the implementation to match the approach used in the statsd exporter. Now, each key-value pair and its associated attributes are treated as individual events. As a result, each key-value pair is processed to create distinct events, with only the related fields present in each event. This modification effectively resolves the issue and offers greater flexibility when using controls and filters to generate visualizations for related data.

@ritalwar
Copy link
Contributor Author

ritalwar commented Oct 24, 2023

For this configuration:

- metric: 'dag_processing.last_duration.<dag_file>'
        labels:
          - attr: dag_file
            field: dag_file
        value:
          field: dag_last_duration

Document generated before the change, with all data combined into a single event:

image

And after change, document featuring only one related field and its attributes within a single event:

image

@ritalwar
Copy link
Contributor Author

Sample dashboard Examples:

image
image

@ritalwar ritalwar requested a review from devamanv October 24, 2023 07:07
@ali786XI
Copy link
Contributor

Testing Details

Test Setup

  • Used a mock server to send custom metrics to StatsD module and verify the results

Test Case Scenarios

Test Case Summaries Status
Verify that the metrics are getting mapped correctly in case of a single placeholder value Pass 💚
Verify that the metrics are getting mapped correctly in case of multiple placeholders Pass 💚
Check that the metrics are getting mapped correctly in case of no placeholders Pass 💚
Check that the metrics are received individually in single documents in case of same placeholders Pass 💚
Check that the labels are also getting mapped correctly Pass 💚
Check that there are no errors related to the mappings while running the beats module Pass 💚

@devamanv
Copy link
Contributor

Just left a minor comment, all other changes look good to me.

@agithomas
Copy link
Contributor

The change looks good to me. However, as discussed, i would recommend to have it tested with Graphite for additional test coverage, if time permits.

@aliabbas-elastic ,
can you please validate the data grouping logic post the change?

Copy link
Contributor

@agithomas agithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ritalwar
Copy link
Contributor Author

@aliabbas-elastic , can you please validate the data grouping logic post the change?

Yes, we can check if statsd.mappings work well with Graphite metrics and if it's logical to have separate events in there too.

@ali786XI
Copy link
Contributor

ali786XI commented Oct 27, 2023

@aliabbas-elastic ,
can you please validate the data grouping logic post the change?

Sure @agithomas I am looking into it.

@ali786XI
Copy link
Contributor

ali786XI commented Oct 27, 2023

Brought up Graphite with default metrics in it

image

Defined mappings for the same in metricbeat.yml and everything seems logical in implementation as metrics are getting separated in individual documents.

Refer one sample_event.json for carbon.aggregator.2aeab3d714e5-a.destinations.activeConnections

Corresponding mapping for the metric in metricbeat.yml

  - metric: carbon.aggregator.<agent_id>.destinations.activeConnections
    labels:
      - attr: agent_id
        field: agent_id
    value:
      field: aggregator_activeConnections

Sample Event for the same:-

{
  "_index": ".ds-metricbeat-8.12.0-2023.10.27-000001",
  "_id": "D-6tcIsBueTgBUaS5yc4",
  "_version": 1,
  "_score": 0,
  "_source": {
    "@timestamp": "2023-10-27T10:28:46.030Z",
    "service": {
      "type": "statsd"
    },
    "statsd": {
      "agent_id": "2aeab3d714e5-a",
      "aggregator_activeConnections": {
        "value": 3
      }
    },
    "labels": {},
    "ecs": {
      "version": "8.0.0"
    },
    "host": {
      "name": "co7"
    },
    "agent": {
      "name": "co7",
      "type": "metricbeat",
      "version": "8.12.0",
      "ephemeral_id": "127a9de0-290b-4033-b209-fe29aa3b2a13",
      "id": "78f2c980-468e-44b3-9a85-4c47fe79c50f"
    },
    "event": {
      "module": "statsd",
      "dataset": "statsd"
    },
    "metricset": {
      "name": "server"
    }
  },
  "fields": {
    "statsd.agent_id": [
      "2aeab3d714e5-a"
    ],
    "statsd.aggregator_activeConnections.value": [
      3
    ],
    "kibana_stats.timestamp": [
      "2023-10-27T10:28:46.030Z"
    ],
    "agent.hostname": [
      "co7"
    ],
    "metricset.name": [
      "server"
    ],
    "service.type": [
      "statsd"
    ],
    "agent.type": [
      "metricbeat"
    ],
    "@timestamp": [
      "2023-10-27T10:28:46.030Z"
    ],
    "logstash_stats.timestamp": [
      "2023-10-27T10:28:46.030Z"
    ],
    "agent.id": [
      "78f2c980-468e-44b3-9a85-4c47fe79c50f"
    ],
    "event.module": [
      "statsd"
    ],
    "ecs.version": [
      "8.0.0"
    ],
    "agent.ephemeral_id": [
      "127a9de0-290b-4033-b209-fe29aa3b2a13"
    ],
    "agent.name": [
      "co7"
    ],
    "agent.version": [
      "8.12.0"
    ],
    "host.name": [
      "co7"
    ],
    "beats_state.timestamp": [
      "2023-10-27T10:28:46.030Z"
    ],
    "event.dataset": [
      "statsd"
    ],
    "beats_state.state.host.name": [
      "co7"
    ],
    "timestamp": [
      "2023-10-27T10:28:46.030Z"
    ]
  }
}

cc:- @agithomas @ritalwar

for idx, tagGroup := range groups {

mapstrTags := mapstr.M{}
// If there are no groups, return an empty slice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: the comment does not match the code - we aren't returning an empty slice here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

}
events = append(events, &mb.Event{
MetricSetFields: ms,
RootFields: mapstr.M{"labels": mapstrTags},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to worry about increasing storage cost by duplicating the tags across many more metric documents?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tags aren't always there. We're not adding them ourselves; they are optional. In fact, I couldn't find them in the case of Airflow, so there is no extra overhead involved. More about StatsD tags in general, as they are supported by every StatsD implementation, such as Datadog and Atlassian. Same we are supporting this here.

<-done

assert.Len(t, events, 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this test fail without the new changes in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed running the test before making this change, but I've run it now and made the necessary updates(the tags needed to be identical for both the events to test this change). Thanks!

@ritalwar ritalwar force-pushed the statsd_mapping_3932 branch from f46ecaa to b2cb507 Compare October 30, 2023 05:58
@ritalwar ritalwar added backport-v8.11.0 Automated backport with mergify backport and removed backport labels Oct 30, 2023
@ritalwar ritalwar merged commit adcd4b0 into elastic:main Oct 30, 2023
8 checks passed
mergify bot pushed a commit that referenced this pull request Oct 30, 2023
* Fix eventmapping implementation for statsd module.

(cherry picked from commit adcd4b0)
ritalwar pushed a commit that referenced this pull request Oct 30, 2023
* Fix eventmapping implementation for statsd module.
Scholar-Li pushed a commit to Scholar-Li/beats that referenced this pull request Feb 5, 2024
* Fix eventmapping implementation for statsd module.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.11.0 Automated backport with mergify bug Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants