Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix low hanging fruits for render performance #4485

Merged
merged 8 commits into from
Nov 5, 2024

Conversation

SimonDanisch
Copy link
Member

Description

using GLMakie
using BenchmarkTools
f, ax, pl = scatter(1:5);
sc = display(f)
@btime GLMakie.render_frame(sc)

Makie master

118.770 μs (1193 allocations: 30.41 KiB)

With sorting change

104.764 μs (729 allocations: 25.48 KiB)

With framebuffer_size optimization

85.648 μs (727 allocations: 25.45 KiB)

@MakieBot
Copy link
Collaborator

MakieBot commented Oct 15, 2024

Compile Times benchmark

Note, that these numbers may fluctuate on the CI servers, so take them with a grain of salt. All benchmark results are based on the mean time and negative percent mean faster than the base branch. Note, that GLMakie + WGLMakie run on an emulated GPU, so the runtime benchmark is much slower. Results are from running:

using_time = @ctime using Backend
# Compile time
create_time = @ctime fig = scatter(1:4; color=1:4, colormap=:turbo, markersize=20, visible=true)
display_time = @ctime Makie.colorbuffer(display(fig))
# Runtime
create_time = @benchmark fig = scatter(1:4; color=1:4, colormap=:turbo, markersize=20, visible=true)
display_time = @benchmark Makie.colorbuffer(fig)
using create display create display
GLMakie 5.08s (5.05, 5.16) 0.04+- 109.63ms (108.44, 112.04) 1.51+- 420.26ms (417.63, 426.82) 3.45+- 9.45ms (9.32, 9.54) 0.08+- 25.72ms (25.61, 26.09) 0.17+-
master 5.07s (5.03, 5.14) 0.03+- 109.24ms (107.98, 110.90) 1.04+- 653.52ms (645.92, 661.46) 4.91+- 8.32ms (8.25, 8.41) 0.06+- 25.75ms (25.62, 26.19) 0.20+-
evaluation 1.00x invariant, 0.01s (0.34d, 0.53p, 0.04std) 1.00x invariant, 0.39ms (0.30d, 0.58p, 1.27std) 1.56x faster✅, -233.26ms (-54.96d, 0.00p, 4.18std) 0.88x slower❌, 1.13ms (15.87d, 0.00p, 0.07std) 1.00x invariant, -0.03ms (-0.18d, 0.75p, 0.18std)
CairoMakie 5.21s (5.12, 5.28) 0.06+- 115.04ms (111.63, 119.05) 2.86+- 172.68ms (167.11, 175.90) 3.85+- 9.66ms (9.44, 10.15) 0.22+- 1.29ms (1.25, 1.31) 0.02+-
master 5.10s (5.03, 5.15) 0.04+- 116.53ms (111.87, 119.73) 2.96+- 175.60ms (170.60, 180.09) 3.62+- 9.95ms (9.52, 10.45) 0.34+- 1.23ms (1.17, 1.25) 0.03+-
evaluation 0.98x slower X, 0.1s (1.97d, 0.00p, 0.05std) 1.01x invariant, -1.49ms (-0.51d, 0.36p, 2.91std) 1.02x invariant, -2.92ms (-0.78d, 0.17p, 3.73std) 1.03x invariant, -0.28ms (-0.98d, 0.10p, 0.28std) 0.95x slower❌, 0.07ms (2.61d, 0.00p, 0.03std)
WGLMakie 5.69s (5.51, 5.85) 0.11+- 117.58ms (109.92, 124.66) 5.82+- 5.33s (4.87, 5.60) 0.28+- 14.03ms (13.48, 14.72) 0.50+- 137.67ms (131.05, 146.08) 5.25+-
master 5.59s (5.49, 5.77) 0.09+- 117.56ms (112.00, 127.43) 5.46+- 5.72s (5.56, 5.98) 0.16+- 13.25ms (12.62, 14.38) 0.59+- 133.13ms (130.56, 136.30) 2.07+-
evaluation 0.98x invariant, 0.1s (0.99d, 0.09p, 0.10std) 1.00x invariant, 0.02ms (0.00d, 0.99p, 5.64std) 1.07x faster✅, -0.39s (-1.73d, 0.01p, 0.22std) 0.94x slower❌, 0.78ms (1.42d, 0.02p, 0.54std) 0.97x invariant, 4.55ms (1.14d, 0.07p, 3.66std)

@ffreyer ffreyer added the skip-changelog Skips changelog enforcer label Oct 16, 2024
@ffreyer
Copy link
Collaborator

ffreyer commented Oct 16, 2024

This is failing because lines drop the model uniform (due it not being used) when linestyles are used

@MakieBot
Copy link
Collaborator

MakieBot commented Oct 29, 2024

Benchmark Results

SHA: 96d592f9586ac63ff051fb7abc3db079f8db68a2

Warning

These results are subject to substantial noise because GitHub's CI runs on shared machines that are not ideally suited for benchmarking.

GLMakie
CairoMakie
WGLMakie

@ffreyer
Copy link
Collaborator

ffreyer commented Nov 4, 2024

Same benchmark code, different sorting options:

Change/State time [µs] Allocations Allocated KiB
merge master 77.2µs 727 25.45
rollback sortby function 90.8 1191 30.38
typed sortby 91.2 1191 30.38
inline transformationmatrix(plot) 79.6 758 29.81
inline plot.model[] 80.7 696 21.09
using zvalue2d with transformationmatrix() 84.6 820 23.03

Calling zvalue2d seems to have a significant overhead compared to calling what it does directly. Perhaps due to runtime dispatch? plot.model[] is also slower than transformationmatrix(), maybe due to type stability. I went with calling transformationmatrix() for now, which is pretty close to the original optimization.

Moving around some clip planes code to make setup_clip_planes() type stable got me 53.3µs, 272 allocation, 18KiB. Waiting on CI before I push that.

@@ -31,7 +31,12 @@ function render_frame(screen::Screen; resize_buffers=true)
ShaderAbstractions.switch_context!(nw)

function sortby(x)
return x[3][:model][][3, 4]
robj = x[3]
plot = screen.cache2plot[robj.id]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plot lookup was the expensive bit here, so would be nice if we could avoid it!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dict lookups are O(1) and somewhere around 10ns. My benchmarks for the current solution only have 3% difference to the old one.
Using robj[:model] is also what made this pr fail, because lines with linestyles apply it on the CPU so it does not end up in uniforms. I would also expect this to subtly fail with f32 converts because those can change the model matrix (set it to I after applying it on the CPU)

@ffreyer ffreyer merged commit 88746d9 into master Nov 5, 2024
21 of 22 checks passed
@ffreyer ffreyer deleted the sd/frame-rendering-perf branch November 5, 2024 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip-changelog Skips changelog enforcer
Projects
Status: Merged
Development

Successfully merging this pull request may close these issues.

4 participants