-
Notifications
You must be signed in to change notification settings - Fork 278
Improved clipping
There are some long standing issues with the current clipping implementation. Specifically:
- Sometimes allocates larger areas for clip masks than it technically needs to (memory).
- Handles clip fast paths (e.g. corners) in a limited set of situations (performance).
- Sometimes draws more pixels to clip masks than technically required (performance).
- Current implementation has complexity in places (e.g. determining whether to use shared mask).
- Many items get placed completely in the alpha pass as soon as they have any clips (performance).
With hindsight, I think we can improve on this significantly but I could be missing something fundamental, so I'm writing this up for feedback.
The proposal below makes the clip mask generation work more like the original implementation that kvark did. Mea Culpa. But it also adds on a couple of additional features.
Broadly speaking:
- We treat the clip regions attached to a primitive separately to the clip regions that are part of the clip-scroll tree hierarchy.
- We know that the clip region on a primitive is in local space.
- This greatly simplifies the case of splitting the primitive into segments - for example, a segment for each rounded corner, and inner segments.
- Clip masks from the clip-scroll hierarchy are treated as input image masks to a primitive mask.
- We add render tasks for the clip-scroll hierarchy items as (cacheable) children of the primitive (well, segment) clip mask render tasks.
More detail:
-
In
prepare_prim_for_render
we decide whether this primitive should be drawn in segments. This is done only once for each primitive, and then stored. Doing it here means that we don't do any work until the primitive becomes visible. This involves looking at the (local space) clips attached to the primitive, and other heuristics (such as the size of the primitive, or ratio of clipped area to opaque area). Initially, we'd do this for rectangles only (since that's all the current implementation tries to handle), but this could easily be extended to other primitives types, such as images and borders. -
We iterate the list of segments (or just use the primitive screen rect if not splitting into segments), and calculate screen rects for the segments with local clip mask tasks. For example, a large rounded rect would have 4 segments for the corners, each with a clip task, and some number of internal segments. The internal segments don't have clips (although they may get clips later from the clip-scroll tree).
Performance win - the clip mask draw for these segments only needs to consider one corner, instead of evaluating all four corners as it currently does.
- For each of the segments (both with local clips and without) we now consider the clip stack, which is a list of (PackedLayerIndex, ClipNodeIndex). Each clip-scroll tree clip is represented by a screen space tile grid (e.g. 128x128 tiles). These tiles are lazily allocated and drawn to. We can easily and quickly determine for each segment which tiles of each complex clip in the clip stack are overlapping. Then, we create a render task for each overlapping tile of the clips in the clip-stack, and add them as children of the primitive clip task. Since these tiles are cacheable, and independent, they will always get drawn in pass 0, meaning each tile for a clip-stack clip will only ever get drawn once. The segment clip mask then applies each of the (overlapping) clip tile masks as an image mask.
Performance and memory win - We only allocate tiles of the clip-stack clips lazily. For example, consider a 1000x1000 clip with 3x3 rounded corners (which occurs often, especially on GitHub). In this case, we will only ever allocate 4 128x128 tiles of that clip (each corner). And even better, we'll only allocate those in the case where a segment of a primitive overlaps that clip region (the 3x3 clip region of interest, or the non-aligned rect clip) - if the primitive segments are completely inside the large clip, we'll never end up allocating any clip tiles at all.
Performance win - Any segments within the primitive that are (a) opaque (b) have no local clips and (c) don't overlap clip regions of interest in the clip-scroll clips can be drawn in the opaque pass. I think this will overwhelmingly be the common case for large primitives, giving us a lot more pixels drawn in the opaque pass.
If this works out, I think it may remove quite a bit of implementation complexity too - it seems simpler to me conceptually.
Thoughts?