-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic Iceberg/Hive dataset parsing in Spark instrumentation #8152
base: master
Are you sure you want to change the base?
Conversation
…l plan from spark 3
… in sql update and delete
...on/spark/src/main/java/datadog/trace/instrumentation/spark/AbstractDatadogSparkListener.java
Outdated
Show resolved
Hide resolved
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 59 metrics, 4 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.46.0-SNAPSHOT~825c722451, baseline=1.46.0-SNAPSHOT~07e5d634ce
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.067 s) : 0, 1066897
Total [baseline] (10.459 s) : 0, 10458925
Agent [candidate] (1.056 s) : 0, 1056298
Total [candidate] (10.431 s) : 0, 10431214
section appsec
Agent [baseline] (1.186 s) : 0, 1185763
Total [baseline] (10.724 s) : 0, 10723948
Agent [candidate] (1.188 s) : 0, 1188180
Total [candidate] (10.683 s) : 0, 10682722
section iast
Agent [baseline] (1.198 s) : 0, 1198071
Total [baseline] (10.971 s) : 0, 10971334
Agent [candidate] (1.186 s) : 0, 1185725
Total [candidate] (10.981 s) : 0, 10981107
section profiling
Agent [baseline] (1.258 s) : 0, 1258343
Total [baseline] (10.793 s) : 0, 10793254
Agent [candidate] (1.255 s) : 0, 1254535
Total [candidate] (10.808 s) : 0, 10807577
gantt
title petclinic - break down per module: candidate=1.46.0-SNAPSHOT~825c722451, baseline=1.46.0-SNAPSHOT~07e5d634ce
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (721.987 ms) : 0, 721987
BytebuddyAgent [candidate] (713.206 ms) : 0, 713206
GlobalTracer [baseline] (258.646 ms) : 0, 258646
GlobalTracer [candidate] (255.53 ms) : 0, 255530
AppSec [baseline] (57.521 ms) : 0, 57521
AppSec [candidate] (56.785 ms) : 0, 56785
Remote Config [baseline] (736.222 µs) : 0, 736
Remote Config [candidate] (722.144 µs) : 0, 722
Telemetry [baseline] (12.841 ms) : 0, 12841
Telemetry [candidate] (15.008 ms) : 0, 15008
section appsec
BytebuddyAgent [baseline] (729.7 ms) : 0, 729700
BytebuddyAgent [candidate] (731.196 ms) : 0, 731196
GlobalTracer [baseline] (252.03 ms) : 0, 252030
GlobalTracer [candidate] (252.504 ms) : 0, 252504
AppSec [baseline] (170.536 ms) : 0, 170536
AppSec [candidate] (170.808 ms) : 0, 170808
Remote Config [baseline] (668.618 µs) : 0, 669
Remote Config [candidate] (672.8 µs) : 0, 673
Telemetry [baseline] (8.237 ms) : 0, 8237
Telemetry [candidate] (8.199 ms) : 0, 8199
IAST [baseline] (19.248 ms) : 0, 19248
IAST [candidate] (19.45 ms) : 0, 19450
section iast
BytebuddyAgent [baseline] (843.119 ms) : 0, 843119
BytebuddyAgent [candidate] (832.887 ms) : 0, 832887
GlobalTracer [baseline] (249.411 ms) : 0, 249411
GlobalTracer [candidate] (248.363 ms) : 0, 248363
AppSec [baseline] (58.625 ms) : 0, 58625
AppSec [candidate] (57.988 ms) : 0, 57988
Remote Config [baseline] (703.39 µs) : 0, 703
Remote Config [candidate] (680.039 µs) : 0, 680
Telemetry [baseline] (9.028 ms) : 0, 9028
Telemetry [candidate] (8.929 ms) : 0, 8929
IAST [baseline] (22.098 ms) : 0, 22098
IAST [candidate] (21.842 ms) : 0, 21842
section profiling
ProfilingAgent [baseline] (95.988 ms) : 0, 95988
ProfilingAgent [candidate] (95.388 ms) : 0, 95388
BytebuddyAgent [baseline] (705.696 ms) : 0, 705696
BytebuddyAgent [candidate] (702.958 ms) : 0, 702958
GlobalTracer [baseline] (350.183 ms) : 0, 350183
GlobalTracer [candidate] (350.74 ms) : 0, 350740
AppSec [baseline] (54.723 ms) : 0, 54723
AppSec [candidate] (54.015 ms) : 0, 54015
Remote Config [baseline] (675.201 µs) : 0, 675
Remote Config [candidate] (655.116 µs) : 0, 655
Telemetry [baseline] (8.939 ms) : 0, 8939
Telemetry [candidate] (8.91 ms) : 0, 8910
Profiling [baseline] (96.013 ms) : 0, 96013
Profiling [candidate] (95.416 ms) : 0, 95416
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.46.0-SNAPSHOT~825c722451, baseline=1.46.0-SNAPSHOT~07e5d634ce
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.057 s) : 0, 1057365
Total [baseline] (8.636 s) : 0, 8636355
Agent [candidate] (1.064 s) : 0, 1064297
Total [candidate] (8.628 s) : 0, 8627846
section iast
Agent [baseline] (1.183 s) : 0, 1182595
Total [baseline] (9.214 s) : 0, 9213782
Agent [candidate] (1.185 s) : 0, 1184954
Total [candidate] (9.181 s) : 0, 9181411
section iast_HARDCODED_SECRET_DISABLED
Agent [baseline] (1.181 s) : 0, 1181116
Total [baseline] (9.201 s) : 0, 9201283
Agent [candidate] (1.183 s) : 0, 1183480
Total [candidate] (9.176 s) : 0, 9176435
section iast_TELEMETRY_OFF
Agent [baseline] (1.177 s) : 0, 1177277
Total [baseline] (9.167 s) : 0, 9166587
Agent [candidate] (1.176 s) : 0, 1176326
Total [candidate] (9.192 s) : 0, 9191946
gantt
title insecure-bank - break down per module: candidate=1.46.0-SNAPSHOT~825c722451, baseline=1.46.0-SNAPSHOT~07e5d634ce
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (715.114 ms) : 0, 715114
BytebuddyAgent [candidate] (720.43 ms) : 0, 720430
GlobalTracer [baseline] (255.708 ms) : 0, 255708
GlobalTracer [candidate] (256.646 ms) : 0, 256646
AppSec [baseline] (57.993 ms) : 0, 57993
AppSec [candidate] (55.403 ms) : 0, 55403
Remote Config [baseline] (711.711 µs) : 0, 712
Remote Config [candidate] (724.067 µs) : 0, 724
Telemetry [baseline] (12.794 ms) : 0, 12794
Telemetry [candidate] (15.911 ms) : 0, 15911
section iast
BytebuddyAgent [baseline] (830.623 ms) : 0, 830623
BytebuddyAgent [candidate] (833.893 ms) : 0, 833893
GlobalTracer [baseline] (247.591 ms) : 0, 247591
GlobalTracer [candidate] (247.191 ms) : 0, 247191
AppSec [baseline] (58.201 ms) : 0, 58201
AppSec [candidate] (57.799 ms) : 0, 57799
IAST [baseline] (21.655 ms) : 0, 21655
IAST [candidate] (21.506 ms) : 0, 21506
Remote Config [baseline] (676.497 µs) : 0, 676
Remote Config [candidate] (664.669 µs) : 0, 665
Telemetry [baseline] (8.884 ms) : 0, 8884
Telemetry [candidate] (8.842 ms) : 0, 8842
section iast_HARDCODED_SECRET_DISABLED
BytebuddyAgent [baseline] (830.61 ms) : 0, 830610
BytebuddyAgent [candidate] (832.146 ms) : 0, 832146
GlobalTracer [baseline] (246.374 ms) : 0, 246374
GlobalTracer [candidate] (246.904 ms) : 0, 246904
AppSec [baseline] (58.272 ms) : 0, 58272
AppSec [candidate] (58.317 ms) : 0, 58317
IAST [baseline] (21.334 ms) : 0, 21334
IAST [candidate] (21.61 ms) : 0, 21610
Remote Config [baseline] (684.648 µs) : 0, 685
Remote Config [candidate] (674.303 µs) : 0, 674
Telemetry [baseline] (8.815 ms) : 0, 8815
Telemetry [candidate] (8.761 ms) : 0, 8761
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (828.164 ms) : 0, 828164
BytebuddyAgent [candidate] (826.731 ms) : 0, 826731
GlobalTracer [baseline] (246.345 ms) : 0, 246345
GlobalTracer [candidate] (246.216 ms) : 0, 246216
AppSec [baseline] (57.663 ms) : 0, 57663
AppSec [candidate] (57.988 ms) : 0, 57988
IAST [baseline] (20.794 ms) : 0, 20794
IAST [candidate] (20.973 ms) : 0, 20973
Remote Config [baseline] (662.173 µs) : 0, 662
Remote Config [candidate] (664.746 µs) : 0, 665
Telemetry [baseline] (8.589 ms) : 0, 8589
Telemetry [candidate] (8.705 ms) : 0, 8705
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 16 unstable metrics. Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.46.0-SNAPSHOT~825c722451, baseline=1.46.0-SNAPSHOT~07e5d634ce
dateFormat X
axisFormat %s
section baseline
no_agent (383.851 µs) : 363, 405
. : milestone, 384,
iast (511.836 µs) : 490, 533
. : milestone, 512,
iast_FULL (660.511 µs) : 639, 682
. : milestone, 661,
iast_GLOBAL (529.417 µs) : 507, 552
. : milestone, 529,
iast_HARDCODED_SECRET_DISABLED (503.155 µs) : 481, 525
. : milestone, 503,
iast_INACTIVE (460.091 µs) : 439, 481
. : milestone, 460,
iast_TELEMETRY_OFF (478.896 µs) : 457, 500
. : milestone, 479,
tracing (458.226 µs) : 437, 480
. : milestone, 458,
section candidate
no_agent (383.334 µs) : 364, 403
. : milestone, 383,
iast (503.697 µs) : 481, 526
. : milestone, 504,
iast_FULL (660.22 µs) : 639, 682
. : milestone, 660,
iast_GLOBAL (539.668 µs) : 517, 563
. : milestone, 540,
iast_HARDCODED_SECRET_DISABLED (503.948 µs) : 482, 526
. : milestone, 504,
iast_INACTIVE (455.445 µs) : 434, 477
. : milestone, 455,
iast_TELEMETRY_OFF (489.734 µs) : 468, 511
. : milestone, 490,
tracing (465.192 µs) : 443, 487
. : milestone, 465,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.46.0-SNAPSHOT~825c722451, baseline=1.46.0-SNAPSHOT~07e5d634ce
dateFormat X
axisFormat %s
section baseline
no_agent (1.371 ms) : 1352, 1390
. : milestone, 1371,
appsec (1.758 ms) : 1735, 1782
. : milestone, 1758,
appsec_no_iast (1.765 ms) : 1743, 1788
. : milestone, 1765,
iast (1.507 ms) : 1484, 1530
. : milestone, 1507,
profiling (1.516 ms) : 1491, 1541
. : milestone, 1516,
tracing (1.498 ms) : 1473, 1523
. : milestone, 1498,
section candidate
no_agent (1.376 ms) : 1356, 1396
. : milestone, 1376,
appsec (1.751 ms) : 1728, 1775
. : milestone, 1751,
appsec_no_iast (1.755 ms) : 1731, 1779
. : milestone, 1755,
iast (1.497 ms) : 1474, 1520
. : milestone, 1497,
profiling (1.528 ms) : 1504, 1553
. : milestone, 1528,
tracing (1.501 ms) : 1477, 1525
. : milestone, 1501,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 0 unstable metrics. Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.46.0-SNAPSHOT~825c722451, baseline=1.46.0-SNAPSHOT~07e5d634ce
dateFormat X
axisFormat %s
section baseline
no_agent (1.468 ms) : 1456, 1479
. : milestone, 1468,
appsec (2.346 ms) : 2303, 2388
. : milestone, 2346,
iast (2.097 ms) : 2043, 2151
. : milestone, 2097,
iast_GLOBAL (2.143 ms) : 2089, 2198
. : milestone, 2143,
profiling (1.967 ms) : 1922, 2011
. : milestone, 1967,
tracing (1.932 ms) : 1890, 1973
. : milestone, 1932,
section candidate
no_agent (1.467 ms) : 1456, 1479
. : milestone, 1467,
appsec (2.352 ms) : 2308, 2395
. : milestone, 2352,
iast (2.101 ms) : 2047, 2155
. : milestone, 2101,
iast_GLOBAL (2.145 ms) : 2091, 2200
. : milestone, 2145,
profiling (1.964 ms) : 1921, 2007
. : milestone, 1964,
tracing (1.931 ms) : 1889, 1972
. : milestone, 1931,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.46.0-SNAPSHOT~825c722451, baseline=1.46.0-SNAPSHOT~07e5d634ce
dateFormat X
axisFormat %s
section baseline
no_agent (14.922 s) : 14922000, 14922000
. : milestone, 14922000,
appsec (15.267 s) : 15267000, 15267000
. : milestone, 15267000,
iast (18.75 s) : 18750000, 18750000
. : milestone, 18750000,
iast_GLOBAL (17.819 s) : 17819000, 17819000
. : milestone, 17819000,
profiling (15.192 s) : 15192000, 15192000
. : milestone, 15192000,
tracing (14.773 s) : 14773000, 14773000
. : milestone, 14773000,
section candidate
no_agent (15.287 s) : 15287000, 15287000
. : milestone, 15287000,
appsec (15.069 s) : 15069000, 15069000
. : milestone, 15069000,
iast (18.325 s) : 18325000, 18325000
. : milestone, 18325000,
iast_GLOBAL (17.951 s) : 17951000, 17951000
. : milestone, 17951000,
profiling (14.941 s) : 14941000, 14941000
. : milestone, 14941000,
tracing (14.949 s) : 14949000, 14949000
. : milestone, 14949000,
|
…easier parsing downstream
…k instrumentation, default to true
…rk dataset lineage, default to 500
…en if enclosing sql span is not found
Hi! 👋 Thanks for your pull request! 🎉 To help us review it, please make sure to:
If you need help, please check our contributing guidelines. |
What Does This Do
Introduce the capability to add basic Iceberg/Hive dataset information to the Spark application span by parsing LogicalPlan in Spark Instrumentation.
Motivation
The motivation behind this change is to build out the long-term vision of having end-to-end pipeline lineage with Spark jobs and datasets in mind, supporting root cause analysis and impact analysis across data jobs and datasets.
Additional Notes
The following system properties are introduced to control the behavior of this feature:
dd.spark.data.lineage.enabled
DD_SPARK_DATA_LINEAGE_ENABLED
.dd.spark.data.lineage.limit
DD_SPARK_DATA_LINEAGE_LIMIT
.Contributor Checklist
type:
and (comp:
orinst:
) labels in addition to any usefull labelsclose
,fix
or any linking keywords when referencing an issue.Use
solves
instead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]