fix low hanging fruits for render performance #4485

SimonDanisch · 2024-10-15T20:47:12Z

Description

using GLMakie
using BenchmarkTools
f, ax, pl = scatter(1:5);
sc = display(f)
@btime GLMakie.render_frame(sc)

Makie master

118.770 μs (1193 allocations: 30.41 KiB)

With sorting change

104.764 μs (729 allocations: 25.48 KiB)

With framebuffer_size optimization

85.648 μs (727 allocations: 25.45 KiB)

MakieBot · 2024-10-15T21:06:40Z

Compile Times benchmark

Note, that these numbers may fluctuate on the CI servers, so take them with a grain of salt. All benchmark results are based on the mean time and negative percent mean faster than the base branch. Note, that GLMakie + WGLMakie run on an emulated GPU, so the runtime benchmark is much slower. Results are from running:

using_time = @ctime using Backend
# Compile time
create_time = @ctime fig = scatter(1:4; color=1:4, colormap=:turbo, markersize=20, visible=true)
display_time = @ctime Makie.colorbuffer(display(fig))
# Runtime
create_time = @benchmark fig = scatter(1:4; color=1:4, colormap=:turbo, markersize=20, visible=true)
display_time = @benchmark Makie.colorbuffer(fig)

	using	create	display	create	display
GLMakie	5.08s (5.05, 5.16) 0.04+-	109.63ms (108.44, 112.04) 1.51+-	420.26ms (417.63, 426.82) 3.45+-	9.45ms (9.32, 9.54) 0.08+-	25.72ms (25.61, 26.09) 0.17+-
master	5.07s (5.03, 5.14) 0.03+-	109.24ms (107.98, 110.90) 1.04+-	653.52ms (645.92, 661.46) 4.91+-	8.32ms (8.25, 8.41) 0.06+-	25.75ms (25.62, 26.19) 0.20+-
evaluation	1.00x invariant, 0.01s (0.34d, 0.53p, 0.04std)	1.00x invariant, 0.39ms (0.30d, 0.58p, 1.27std)	1.56x faster✅, -233.26ms (-54.96d, 0.00p, 4.18std)	0.88x slower❌, 1.13ms (15.87d, 0.00p, 0.07std)	1.00x invariant, -0.03ms (-0.18d, 0.75p, 0.18std)
CairoMakie	5.21s (5.12, 5.28) 0.06+-	115.04ms (111.63, 119.05) 2.86+-	172.68ms (167.11, 175.90) 3.85+-	9.66ms (9.44, 10.15) 0.22+-	1.29ms (1.25, 1.31) 0.02+-
master	5.10s (5.03, 5.15) 0.04+-	116.53ms (111.87, 119.73) 2.96+-	175.60ms (170.60, 180.09) 3.62+-	9.95ms (9.52, 10.45) 0.34+-	1.23ms (1.17, 1.25) 0.03+-
evaluation	0.98x slower X, 0.1s (1.97d, 0.00p, 0.05std)	1.01x invariant, -1.49ms (-0.51d, 0.36p, 2.91std)	1.02x invariant, -2.92ms (-0.78d, 0.17p, 3.73std)	1.03x invariant, -0.28ms (-0.98d, 0.10p, 0.28std)	0.95x slower❌, 0.07ms (2.61d, 0.00p, 0.03std)
WGLMakie	5.69s (5.51, 5.85) 0.11+-	117.58ms (109.92, 124.66) 5.82+-	5.33s (4.87, 5.60) 0.28+-	14.03ms (13.48, 14.72) 0.50+-	137.67ms (131.05, 146.08) 5.25+-
master	5.59s (5.49, 5.77) 0.09+-	117.56ms (112.00, 127.43) 5.46+-	5.72s (5.56, 5.98) 0.16+-	13.25ms (12.62, 14.38) 0.59+-	133.13ms (130.56, 136.30) 2.07+-
evaluation	0.98x invariant, 0.1s (0.99d, 0.09p, 0.10std)	1.00x invariant, 0.02ms (0.00d, 0.99p, 5.64std)	1.07x faster✅, -0.39s (-1.73d, 0.01p, 0.22std)	0.94x slower❌, 0.78ms (1.42d, 0.02p, 0.54std)	0.97x invariant, 4.55ms (1.14d, 0.07p, 3.66std)

ffreyer · 2024-10-16T22:47:45Z

This is failing because lines drop the model uniform (due it not being used) when linestyles are used

MakieBot · 2024-10-29T14:04:57Z

Benchmark Results

SHA: 96d592f9586ac63ff051fb7abc3db079f8db68a2

Warning

These results are subject to substantial noise because GitHub's CI runs on shared machines that are not ideally suited for benchmarking.

ffreyer · 2024-11-04T20:34:38Z

Same benchmark code, different sorting options:

Change/State	time [µs]	Allocations	Allocated KiB
merge master	77.2µs	727	25.45
rollback sortby function	90.8	1191	30.38
typed sortby	91.2	1191	30.38
inline `transformationmatrix(plot)`	79.6	758	29.81
inline `plot.model[]`	80.7	696	21.09
using `zvalue2d` with `transformationmatrix()`	84.6	820	23.03

Calling zvalue2d seems to have a significant overhead compared to calling what it does directly. Perhaps due to runtime dispatch? plot.model[] is also slower than transformationmatrix(), maybe due to type stability. I went with calling transformationmatrix() for now, which is pretty close to the original optimization.

Moving around some clip planes code to make setup_clip_planes() type stable got me 53.3µs, 272 allocation, 18KiB. Waiting on CI before I push that.

SimonDanisch · 2024-11-04T22:43:31Z

GLMakie/src/rendering.jl

@@ -31,7 +31,12 @@ function render_frame(screen::Screen; resize_buffers=true)
    ShaderAbstractions.switch_context!(nw)

    function sortby(x)
-        return x[3][:model][][3, 4]
+        robj = x[3]
+        plot = screen.cache2plot[robj.id]


The plot lookup was the expensive bit here, so would be nice if we could avoid it!

Dict lookups are O(1) and somewhere around 10ns. My benchmarks for the current solution only have 3% difference to the old one.
Using robj[:model] is also what made this pr fail, because lines with linestyles apply it on the CPU so it does not end up in uniforms. I would also expect this to subtly fail with f32 converts because those can change the model matrix (set it to I after applying it on the CPU)

SimonDanisch added 2 commits October 15, 2024 22:39

fix low hanging fruits for render performance

42785ce

Merge branch 'master' into sd/frame-rendering-perf

eb217d5

Merge branch 'master' into sd/frame-rendering-perf

210a96f

SimonDanisch mentioned this pull request Oct 16, 2024

Make VideoStream aware of px_per_unit #4466

Open

3 tasks

ffreyer added the skip-changelog Skips changelog enforcer label Oct 16, 2024

Merge branch 'master' into sd/frame-rendering-perf

6dd0415

Merge branch 'master' into sd/frame-rendering-perf

3c4caf5

ffreyer added 2 commits November 4, 2024 20:43

Merge branch 'master' into sd/frame-rendering-perf

f6ddef5

try fix test failures

e118e41

fix runtime dispatch with clip planes

96d592f

SimonDanisch commented Nov 4, 2024

View reviewed changes

ffreyer merged commit 88746d9 into master Nov 5, 2024
21 of 22 checks passed

ffreyer deleted the sd/frame-rendering-perf branch November 5, 2024 14:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix low hanging fruits for render performance #4485

fix low hanging fruits for render performance #4485

SimonDanisch commented Oct 15, 2024

MakieBot commented Oct 15, 2024 •

edited

Loading

ffreyer commented Oct 16, 2024

MakieBot commented Oct 29, 2024 •

edited

Loading

ffreyer commented Nov 4, 2024 •

edited

Loading

SimonDanisch Nov 4, 2024

ffreyer Nov 5, 2024

fix low hanging fruits for render performance #4485

fix low hanging fruits for render performance #4485

Conversation

SimonDanisch commented Oct 15, 2024

Description

Makie master

With sorting change

With framebuffer_size optimization

MakieBot commented Oct 15, 2024 • edited Loading

Compile Times benchmark

ffreyer commented Oct 16, 2024

MakieBot commented Oct 29, 2024 • edited Loading

Benchmark Results

ffreyer commented Nov 4, 2024 • edited Loading

SimonDanisch Nov 4, 2024

Choose a reason for hiding this comment

ffreyer Nov 5, 2024

Choose a reason for hiding this comment

MakieBot commented Oct 15, 2024 •

edited

Loading

MakieBot commented Oct 29, 2024 •

edited

Loading

ffreyer commented Nov 4, 2024 •

edited

Loading