-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More trace and x64 optimisations #1441
Conversation
ykrt/src/compile/jitc_yk/opt/mod.rs
Outdated
let Const::Int(_, v) = self.m.const_(cidx) else { | ||
panic!() | ||
}; | ||
// Exceeding `u16::MAX` is undefined behaviour: we choose to saturate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure where the u16::MAX
constraint comes from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's totally arbitrary: anything above the bit width of the type is UB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anything above the bit width of the type is UB.
If I'm understanding correctly, LLVM getelementptr
(which is where dyn_ptradd
comes from) allows you to compute pointers to anywhere in memory regardless of the type you are operating on.
The docs say:
The result value of the getelementptr may be outside the object pointed to by the base pointer.
Hopefully I haven't gotten the wrong end of the stick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry I misread where you were commenting on!
The docs don't seem to specify a maximum offset, but there obviously must be a maximum. I assumed, based on some other things, that a u16
offset might be enough, but I admit that I didn't follow up on this. Any thoughts as toe the maximum offset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the maximum offset is determined by what LLVM calls the "pointer index type" for the address space in question.
It's documented in that same section of the docs (the same passage that I was fretting over in MM yesterday):
The indices are first converted to offsets in the pointer’s index type. If the currently indexed type is a struct type, the struct offset corresponding to the index is sign-extended or truncated to the pointer index type. Otherwise, the index itself is sign-extended or truncated, and then multiplied by the type allocation size (that is, the size rounded up to the ABI alignment) of the currently indexed type.
The offsets are then added to the low bits of the base address up to the index type width, with silently-wrapping two’s complement arithmetic. If the pointer size is larger than the index size, this means that the bits outside the index type width will not be affected.
For us, it's important to know:
- We only allow GEP on address space zero, and our IR serialiser will reject all other GEPs
- We assume and assert that the pointer index type is the same size as a pointer for address space zero.
Under these assumptions, I think the maximum offset is expressed by a signed 64-bit integer, not an unsigned 16-bit one. I think we are also expected to sign-extend the index if it's smaller. But please check my reasoning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so this is messy because we have an i32
limit, so we hit "badness" before LLVM says we should hit UB. 2e73f8c is my attempt to split the difference...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm. Good point.
Let's roll with this for now.
Looks good. Just a couple of comments. |
Please squash. |
Squashed. |
This PR bundles up a number of optimisations that benefit big_loop.lua: mostly trace optimisations, but also one x64 optimisation (and one bug fix). With a suitable
yk_promote
in yklua, this PR speeds big_loop up by about 43%. The main reason for this is that we are able to constant fold a lot of goop around instruction decoding, so a lot of stuff intrace-pre-opt
disappears entirely bytrace-post-opt
.