Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add instrumentation to allow sending kafka payload size on produce #6228

Merged
merged 11 commits into from
Jan 2, 2024

Conversation

vandonr
Copy link
Contributor

@vandonr vandonr commented Nov 15, 2023

I worked on this topic on #6045 but upon closer review, realized that I forgot to tool the produce side, which should send the payload size as well, according to the spec

To minimize the performance impact of computing the payload size, I'm reusing the result of such a computation that already happens in the kafka client as part of the send code, here, by instrumenting this method.

To do that, the solution I found was to hijack the code where the metric is sent, to save it until we know the payload size, and send it only then.

I see 3 ways to save the stats before we send them:

  • a ContextStore of <Span, StatsPoint>, but I think we don't want to use spans as keys of a context store
  • serializing the StatsPoint in a string, and storing it as a baggageItem in the span, but that's unnecessary string manipulations
  • adding a field in the pathway context, which is what I did.

@vandonr vandonr requested a review from amarziali November 15, 2023 18:02
Copy link
Collaborator

@amarziali amarziali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If payload size stats are really required on producer side, separating the stat calculation from the injection is a required step at this point.
It looks ok but please ask an explicit review to apm-java and dsm folks (because of the API change).

@pr-commenter
Copy link

pr-commenter bot commented Dec 19, 2023

Kafka / producer-benchmark

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master vandonr/payload
git_commit_date 1703709122 1703764880
git_commit_sha 97e88a8 60e046d
See matching parameters
Baseline Candidate
ci_job_date 1703766122 1703766122
ci_job_id 397336125 397336125
ci_pipeline_id 25643273 25643273
cpu_model Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
jdkVersion 11.0.21 11.0.21
jmhVersion 1.36 1.36
jvm /usr/lib/jvm/java-11-openjdk-amd64/bin/java /usr/lib/jvm/java-11-openjdk-amd64/bin/java
jvmArgs -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/producer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/producer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant
vmName OpenJDK 64-Bit Server VM OpenJDK 64-Bit Server VM
vmVersion 11.0.21+9-post-Ubuntu-0ubuntu122.04 11.0.21+9-post-Ubuntu-0ubuntu122.04

Summary

Found 1 performance improvements and 0 performance regressions! Performance is the same for 2 metrics, 0 unstable metrics.

scenario Δ mean throughput
scenario:only-tracing-dsm-disabled-benchmarks/KafkaProduceBenchmark.benchProduce better
[+4560.694op/s; +13583.476op/s] or [+3.358%; +10.002%]
See unchanged results
scenario Δ mean throughput
scenario:not-instrumented/KafkaProduceBenchmark.benchProduce same
scenario:only-tracing-dsm-enabled-benchmarks/KafkaProduceBenchmark.benchProduce same

@pr-commenter
Copy link

pr-commenter bot commented Dec 19, 2023

Kafka / consumer-benchmark

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master vandonr/payload
git_commit_date 1703709122 1703764880
git_commit_sha 97e88a8 60e046d
See matching parameters
Baseline Candidate
ci_job_date 1703766033 1703766033
ci_job_id 397336126 397336126
ci_pipeline_id 25643273 25643273
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
jdkVersion 11.0.21 11.0.21
jmhVersion 1.36 1.36
jvm /usr/lib/jvm/java-11-openjdk-amd64/bin/java /usr/lib/jvm/java-11-openjdk-amd64/bin/java
jvmArgs -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/consumer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/go/src/github.com/DataDog/apm-reliability/dd-trace-java/platform/src/consumer-benchmark/build/tmp/jmh -Duser.country=US -Duser.language=en -Duser.variant
vmName OpenJDK 64-Bit Server VM OpenJDK 64-Bit Server VM
vmVersion 11.0.21+9-post-Ubuntu-0ubuntu122.04 11.0.21+9-post-Ubuntu-0ubuntu122.04

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 3 metrics, 0 unstable metrics.

See unchanged results
scenario Δ mean throughput
scenario:not-instrumented/KafkaConsumerBenchmark.benchConsume same
scenario:only-tracing-dsm-disabled-benchmarks/KafkaConsumerBenchmark.benchConsume same
scenario:only-tracing-dsm-enabled-benchmarks/KafkaConsumerBenchmark.benchConsume same

@pr-commenter
Copy link

pr-commenter bot commented Dec 19, 2023

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master vandonr/payload
git_commit_date 1703709122 1703764880
git_commit_sha 97e88a8 60e046d
release_version 1.27.0-SNAPSHOT~97e88a8887 1.27.0-SNAPSHOT~60e046dffa
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1703767402 1703767402
ci_job_id 397336124 397336124
ci_pipeline_id 25643273 25643273
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
module Agent Agent
parent None None
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 45 metrics, 9 unstable metrics.

Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.27.0-SNAPSHOT~60e046dffa, baseline=1.27.0-SNAPSHOT~97e88a8887

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.049 s) : 0, 1049427
Total [baseline] (9.345 s) : 0, 9345146
Agent [candidate] (1.053 s) : 0, 1052581
Total [candidate] (9.361 s) : 0, 9360576
section appsec
Agent [baseline] (1.151 s) : 0, 1151401
Total [baseline] (9.469 s) : 0, 9469468
Agent [candidate] (1.156 s) : 0, 1156049
Total [candidate] (9.451 s) : 0, 9450905
section iast
Agent [baseline] (1.168 s) : 0, 1167697
Total [baseline] (9.602 s) : 0, 9602329
Agent [candidate] (1.169 s) : 0, 1168993
Total [candidate] (9.586 s) : 0, 9585757
section profiling
Agent [baseline] (1.255 s) : 0, 1254667
Total [baseline] (9.746 s) : 0, 9746403
Agent [candidate] (1.245 s) : 0, 1245304
Total [candidate] (9.615 s) : 0, 9614942
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.049 s -
Agent appsec 1.151 s 101.974 ms (9.7%)
Agent iast 1.168 s 118.27 ms (11.3%)
Agent profiling 1.255 s 205.24 ms (19.6%)
Total tracing 9.345 s -
Total appsec 9.469 s 124.322 ms (1.3%)
Total iast 9.602 s 257.183 ms (2.8%)
Total profiling 9.746 s 401.257 ms (4.3%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.053 s -
Agent appsec 1.156 s 103.468 ms (9.8%)
Agent iast 1.169 s 116.412 ms (11.1%)
Agent profiling 1.245 s 192.723 ms (18.3%)
Total tracing 9.361 s -
Total appsec 9.451 s 90.329 ms (1.0%)
Total iast 9.586 s 225.181 ms (2.4%)
Total profiling 9.615 s 254.366 ms (2.7%)
gantt
    title petclinic - break down per module: candidate=1.27.0-SNAPSHOT~60e046dffa, baseline=1.27.0-SNAPSHOT~97e88a8887

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (649.822 ms) : 0, 649822
BytebuddyAgent [candidate] (651.442 ms) : 0, 651442
GlobalTracer [baseline] (306.884 ms) : 0, 306884
GlobalTracer [candidate] (308.494 ms) : 0, 308494
AppSec [baseline] (50.557 ms) : 0, 50557
AppSec [candidate] (50.482 ms) : 0, 50482
Remote Config [baseline] (672.99 µs) : 0, 673
Remote Config [candidate] (672.336 µs) : 0, 672
Telemetry [baseline] (7.298 ms) : 0, 7298
Telemetry [candidate] (7.208 ms) : 0, 7208
section appsec
BytebuddyAgent [baseline] (653.323 ms) : 0, 653323
BytebuddyAgent [candidate] (655.211 ms) : 0, 655211
GlobalTracer [baseline] (307.605 ms) : 0, 307605
GlobalTracer [candidate] (309.143 ms) : 0, 309143
AppSec [baseline] (148.57 ms) : 0, 148570
AppSec [candidate] (149.513 ms) : 0, 149513
Remote Config [baseline] (647.962 µs) : 0, 648
Remote Config [candidate] (651.346 µs) : 0, 651
Telemetry [baseline] (6.931 ms) : 0, 6931
Telemetry [candidate] (6.989 ms) : 0, 6989
section iast
BytebuddyAgent [baseline] (769.024 ms) : 0, 769024
BytebuddyAgent [candidate] (770.189 ms) : 0, 770189
GlobalTracer [baseline] (285.069 ms) : 0, 285069
GlobalTracer [candidate] (284.704 ms) : 0, 284704
AppSec [baseline] (50.786 ms) : 0, 50786
AppSec [candidate] (52.581 ms) : 0, 52581
Remote Config [baseline] (570.671 µs) : 0, 571
Remote Config [candidate] (593.929 µs) : 0, 594
Telemetry [baseline] (8.71 ms) : 0, 8710
Telemetry [candidate] (7.202 ms) : 0, 7202
IAST [baseline] (19.221 ms) : 0, 19221
IAST [candidate] (19.425 ms) : 0, 19425
section profiling
BytebuddyAgent [baseline] (664.903 ms) : 0, 664903
BytebuddyAgent [candidate] (659.495 ms) : 0, 659495
GlobalTracer [baseline] (379.568 ms) : 0, 379568
GlobalTracer [candidate] (377.594 ms) : 0, 377594
AppSec [baseline] (51.602 ms) : 0, 51602
AppSec [candidate] (50.982 ms) : 0, 50982
Remote Config [baseline] (997.608 µs) : 0, 998
Remote Config [candidate] (988.027 µs) : 0, 988
Telemetry [baseline] (7.311 ms) : 0, 7311
Telemetry [candidate] (7.174 ms) : 0, 7174
ProfilingAgent [baseline] (95.544 ms) : 0, 95544
ProfilingAgent [candidate] (94.744 ms) : 0, 94744
Profiling [baseline] (95.569 ms) : 0, 95569
Profiling [candidate] (94.769 ms) : 0, 94769
Loading
Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.27.0-SNAPSHOT~60e046dffa, baseline=1.27.0-SNAPSHOT~97e88a8887

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.054 s) : 0, 1054318
Total [baseline] (8.736 s) : 0, 8735587
Agent [candidate] (1.057 s) : 0, 1057042
Total [candidate] (8.774 s) : 0, 8773618
section iast
Agent [baseline] (1.166 s) : 0, 1165751
Total [baseline] (9.262 s) : 0, 9262052
Agent [candidate] (1.187 s) : 0, 1187137
Total [candidate] (9.282 s) : 0, 9281926
section iast_TELEMETRY_OFF
Agent [baseline] (1.178 s) : 0, 1177904
Total [baseline] (9.33 s) : 0, 9330035
Agent [candidate] (1.163 s) : 0, 1163320
Total [candidate] (9.27 s) : 0, 9269848
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.054 s -
Agent iast 1.166 s 111.433 ms (10.6%)
Agent iast_TELEMETRY_OFF 1.178 s 123.586 ms (11.7%)
Total tracing 8.736 s -
Total iast 9.262 s 526.465 ms (6.0%)
Total iast_TELEMETRY_OFF 9.33 s 594.448 ms (6.8%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.057 s -
Agent iast 1.187 s 130.095 ms (12.3%)
Agent iast_TELEMETRY_OFF 1.163 s 106.278 ms (10.1%)
Total tracing 8.774 s -
Total iast 9.282 s 508.308 ms (5.8%)
Total iast_TELEMETRY_OFF 9.27 s 496.23 ms (5.7%)
gantt
    title insecure-bank - break down per module: candidate=1.27.0-SNAPSHOT~60e046dffa, baseline=1.27.0-SNAPSHOT~97e88a8887

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (652.761 ms) : 0, 652761
BytebuddyAgent [candidate] (654.479 ms) : 0, 654479
GlobalTracer [baseline] (308.409 ms) : 0, 308409
GlobalTracer [candidate] (309.109 ms) : 0, 309109
AppSec [baseline] (50.81 ms) : 0, 50810
AppSec [candidate] (50.999 ms) : 0, 50999
Remote Config [baseline] (680.044 µs) : 0, 680
Remote Config [candidate] (677.199 µs) : 0, 677
Telemetry [baseline] (7.282 ms) : 0, 7282
Telemetry [candidate] (7.253 ms) : 0, 7253
section iast
BytebuddyAgent [baseline] (768.007 ms) : 0, 768007
BytebuddyAgent [candidate] (782.963 ms) : 0, 782963
GlobalTracer [baseline] (284.789 ms) : 0, 284789
GlobalTracer [candidate] (289.655 ms) : 0, 289655
AppSec [baseline] (52.833 ms) : 0, 52833
AppSec [candidate] (53.504 ms) : 0, 53504
Remote Config [baseline] (560.93 µs) : 0, 561
Remote Config [candidate] (567.641 µs) : 0, 568
Telemetry [baseline] (6.412 ms) : 0, 6412
Telemetry [candidate] (6.475 ms) : 0, 6475
IAST [baseline] (18.957 ms) : 0, 18957
IAST [candidate] (19.217 ms) : 0, 19217
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (773.174 ms) : 0, 773174
BytebuddyAgent [candidate] (762.961 ms) : 0, 762961
GlobalTracer [baseline] (289.045 ms) : 0, 289045
GlobalTracer [candidate] (285.879 ms) : 0, 285879
AppSec [baseline] (50.047 ms) : 0, 50047
AppSec [candidate] (49.341 ms) : 0, 49341
Remote Config [baseline] (599.726 µs) : 0, 600
Remote Config [candidate] (623.637 µs) : 0, 624
Telemetry [baseline] (6.532 ms) : 0, 6532
Telemetry [candidate] (7.425 ms) : 0, 7425
IAST [baseline] (23.826 ms) : 0, 23826
IAST [candidate] (22.691 ms) : 0, 22691
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
end_time 2023-12-28T12:22:29 2023-12-28T12:39:05
git_branch master vandonr/payload
git_commit_date 1703709122 1703764880
git_commit_sha 97e88a8 60e046d
release_version 1.27.0-SNAPSHOT~97e88a8887 1.27.0-SNAPSHOT~60e046dffa
start_time 2023-12-28T12:22:15 2023-12-28T12:38:52
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1703767402 1703767402
ci_job_id 397336124 397336124
ci_pipeline_id 25643273 25643273
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 10 metrics, 12 unstable metrics.

Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.27.0-SNAPSHOT~60e046dffa, baseline=1.27.0-SNAPSHOT~97e88a8887
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.359 ms) : 1340, 1378
.   : milestone, 1359,
appsec (1.781 ms) : 1755, 1806
.   : milestone, 1781,
iast (1.523 ms) : 1499, 1548
.   : milestone, 1523,
profiling (1.52 ms) : 1494, 1545
.   : milestone, 1520,
tracing (1.495 ms) : 1470, 1520
.   : milestone, 1495,
section candidate
no_agent (1.369 ms) : 1349, 1390
.   : milestone, 1369,
appsec (1.782 ms) : 1756, 1807
.   : milestone, 1782,
iast (1.513 ms) : 1488, 1537
.   : milestone, 1513,
profiling (1.548 ms) : 1523, 1573
.   : milestone, 1548,
tracing (1.494 ms) : 1469, 1519
.   : milestone, 1494,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.359 ms [1.34 ms, 1.378 ms] -
appsec 1.781 ms [1.755 ms, 1.806 ms] 421.743 µs (31.0%)
iast 1.523 ms [1.499 ms, 1.548 ms] 164.419 µs (12.1%)
profiling 1.52 ms [1.494 ms, 1.545 ms] 160.646 µs (11.8%)
tracing 1.495 ms [1.47 ms, 1.52 ms] 136.002 µs (10.0%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.369 ms [1.349 ms, 1.39 ms] -
appsec 1.782 ms [1.756 ms, 1.807 ms] 412.105 µs (30.1%)
iast 1.513 ms [1.488 ms, 1.537 ms] 143.231 µs (10.5%)
profiling 1.548 ms [1.523 ms, 1.573 ms] 178.603 µs (13.0%)
tracing 1.494 ms [1.469 ms, 1.519 ms] 124.599 µs (9.1%)
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.27.0-SNAPSHOT~60e046dffa, baseline=1.27.0-SNAPSHOT~97e88a8887
    dateFormat X
    axisFormat %s
section baseline
no_agent (368.796 µs) : 349, 389
.   : milestone, 369,
iast (485.964 µs) : 465, 507
.   : milestone, 486,
iast_FULL (547.66 µs) : 527, 568
.   : milestone, 548,
iast_INACTIVE (454.316 µs) : 434, 475
.   : milestone, 454,
iast_TELEMETRY_OFF (471.194 µs) : 451, 492
.   : milestone, 471,
tracing (445.616 µs) : 425, 466
.   : milestone, 446,
section candidate
no_agent (369.143 µs) : 349, 390
.   : milestone, 369,
iast (488.687 µs) : 468, 510
.   : milestone, 489,
iast_FULL (542.815 µs) : 522, 563
.   : milestone, 543,
iast_INACTIVE (450.163 µs) : 429, 471
.   : milestone, 450,
iast_TELEMETRY_OFF (480.496 µs) : 460, 501
.   : milestone, 480,
tracing (441.279 µs) : 421, 462
.   : milestone, 441,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 368.796 µs [348.571 µs, 389.021 µs] -
iast 485.964 µs [464.948 µs, 506.981 µs] 117.168 µs (31.8%)
iast_FULL 547.66 µs [527.161 µs, 568.158 µs] 178.864 µs (48.5%)
iast_INACTIVE 454.316 µs [433.852 µs, 474.78 µs] 85.52 µs (23.2%)
iast_TELEMETRY_OFF 471.194 µs [450.801 µs, 491.586 µs] 102.398 µs (27.8%)
tracing 445.616 µs [425.264 µs, 465.969 µs] 76.82 µs (20.8%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 369.143 µs [348.774 µs, 389.512 µs] -
iast 488.687 µs [467.587 µs, 509.787 µs] 119.544 µs (32.4%)
iast_FULL 542.815 µs [522.341 µs, 563.288 µs] 173.671 µs (47.0%)
iast_INACTIVE 450.163 µs [429.484 µs, 470.842 µs] 81.02 µs (21.9%)
iast_TELEMETRY_OFF 480.496 µs [460.043 µs, 500.948 µs] 111.352 µs (30.2%)
tracing 441.279 µs [420.868 µs, 461.691 µs] 72.136 µs (19.5%)

@vandonr vandonr marked this pull request as ready for review December 27, 2023 16:29
@vandonr vandonr requested a review from a team as a code owner December 27, 2023 16:29
@vandonr vandonr requested review from mcculls and am312 December 27, 2023 16:29
Copy link
Contributor

@bantonsson bantonsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Only one minor question.

@vandonr vandonr merged commit 7440eac into master Jan 2, 2024
75 checks passed
@vandonr vandonr deleted the vandonr/payload branch January 2, 2024 09:50
@github-actions github-actions bot added this to the 1.27.0 milestone Jan 2, 2024
@PerfectSlayer PerfectSlayer added the inst: others All other instrumentations label Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inst: others All other instrumentations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants