More trace and x64 optimisations #1441

ltratt · 2024-10-29T12:47:38Z

This PR bundles up a number of optimisations that benefit big_loop.lua: mostly trace optimisations, but also one x64 optimisation (and one bug fix). With a suitable yk_promote in yklua, this PR speeds big_loop up by about 43%. The main reason for this is that we are able to constant fold a lot of goop around instruction decoding, so a lot of stuff in trace-pre-opt disappears entirely by trace-post-opt.

vext01 · 2024-10-29T13:39:22Z

ykrt/src/compile/jitc_yk/opt/mod.rs

+                        let Const::Int(_, v) = self.m.const_(cidx) else {
+                            panic!()
+                        };
+                        // Exceeding `u16::MAX` is undefined behaviour: we choose to saturate


I wasn't sure where the u16::MAX constraint comes from.

It's totally arbitrary: anything above the bit width of the type is UB.

anything above the bit width of the type is UB.

If I'm understanding correctly, LLVM getelementptr (which is where dyn_ptradd comes from) allows you to compute pointers to anywhere in memory regardless of the type you are operating on.

The docs say:

The result value of the getelementptr may be outside the object pointed to by the base pointer.

Hopefully I haven't gotten the wrong end of the stick.

Oops, sorry I misread where you were commenting on!

The docs don't seem to specify a maximum offset, but there obviously must be a maximum. I assumed, based on some other things, that a u16 offset might be enough, but I admit that I didn't follow up on this. Any thoughts as toe the maximum offset?

I think the maximum offset is determined by what LLVM calls the "pointer index type" for the address space in question.

It's documented in that same section of the docs (the same passage that I was fretting over in MM yesterday):

The indices are first converted to offsets in the pointer’s index type. If the currently indexed type is a struct type, the struct offset corresponding to the index is sign-extended or truncated to the pointer index type. Otherwise, the index itself is sign-extended or truncated, and then multiplied by the type allocation size (that is, the size rounded up to the ABI alignment) of the currently indexed type.

The offsets are then added to the low bits of the base address up to the index type width, with silently-wrapping two’s complement arithmetic. If the pointer size is larger than the index size, this means that the bits outside the index type width will not be affected.

For us, it's important to know:

We only allow GEP on address space zero, and our IR serialiser will reject all other GEPs

We assume and assert that the pointer index type is the same size as a pointer for address space zero.

Under these assumptions, I think the maximum offset is expressed by a signed 64-bit integer, not an unsigned 16-bit one. I think we are also expected to sign-extend the index if it's smaller. But please check my reasoning.

OK, so this is messy because we have an i32 limit, so we hit "badness" before LLVM says we should hit UB. 2e73f8c is my attempt to split the difference...

Hrm. Good point.

Let's roll with this for now.

ykrt/src/compile/jitc_yk/codegen/x64/mod.rs

vext01 · 2024-10-29T13:42:36Z

Looks good. Just a couple of comments.

vext01 · 2024-10-29T16:14:20Z

Please squash.

ltratt · 2024-10-29T16:15:23Z

Squashed.

ltratt added 7 commits October 29, 2024 12:25

Fix typo.

a1593bc

Allow promotion of C unsigned ints.

19416ac

Use correct syntax for doc comment.

07be11b

Optimise BinOp::And in traces.

a06b79b

Optimise BinOp::LShr in traces.

5e1de7f

Optimise BinOp::Or in traces.

da71e54

Optimise ZExt in traces.

9ede92b

ltratt assigned vext01 Oct 29, 2024

ltratt mentioned this pull request Oct 29, 2024

Promote instructions. ykjit/yklua#94

Merged

vext01 reviewed Oct 29, 2024

View reviewed changes

ykrt/src/compile/jitc_yk/codegen/x64/mod.rs Show resolved Hide resolved

ltratt added 3 commits October 29, 2024 16:14

Optimise DynPtrAdd in traces.

ef917fe

Use x64 neg for sub 0, ....

6c7fbef

Jump var instruction indexes need to be decopied.

a430da3

ltratt force-pushed the opt_opt_opt branch from 2e73f8c to a430da3 Compare October 29, 2024 16:14

vext01 added this pull request to the merge queue Oct 29, 2024

Merged via the queue into ykjit:master with commit 9afc721 Oct 29, 2024
2 checks passed

ltratt deleted the opt_opt_opt branch October 30, 2024 17:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More trace and x64 optimisations #1441

More trace and x64 optimisations #1441

ltratt commented Oct 29, 2024 •

edited

Loading

vext01 Oct 29, 2024

ltratt Oct 29, 2024

vext01 Oct 29, 2024 •

edited

Loading

ltratt Oct 29, 2024

vext01 Oct 29, 2024

ltratt Oct 29, 2024 •

edited

Loading

vext01 Oct 29, 2024

vext01 commented Oct 29, 2024

vext01 commented Oct 29, 2024

ltratt commented Oct 29, 2024

More trace and x64 optimisations #1441

More trace and x64 optimisations #1441

Conversation

ltratt commented Oct 29, 2024 • edited Loading

vext01 Oct 29, 2024

Choose a reason for hiding this comment

ltratt Oct 29, 2024

Choose a reason for hiding this comment

vext01 Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

ltratt Oct 29, 2024

Choose a reason for hiding this comment

vext01 Oct 29, 2024

Choose a reason for hiding this comment

ltratt Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

vext01 Oct 29, 2024

Choose a reason for hiding this comment

vext01 commented Oct 29, 2024

vext01 commented Oct 29, 2024

ltratt commented Oct 29, 2024

ltratt commented Oct 29, 2024 •

edited

Loading

vext01 Oct 29, 2024 •

edited

Loading

ltratt Oct 29, 2024 •

edited

Loading