Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More trace and x64 optimisations #1441

Merged
merged 10 commits into from
Oct 29, 2024
Merged

More trace and x64 optimisations #1441

merged 10 commits into from
Oct 29, 2024

Conversation

ltratt
Copy link
Contributor

@ltratt ltratt commented Oct 29, 2024

This PR bundles up a number of optimisations that benefit big_loop.lua: mostly trace optimisations, but also one x64 optimisation (and one bug fix). With a suitable yk_promote in yklua, this PR speeds big_loop up by about 43%. The main reason for this is that we are able to constant fold a lot of goop around instruction decoding, so a lot of stuff in trace-pre-opt disappears entirely by trace-post-opt.

let Const::Int(_, v) = self.m.const_(cidx) else {
panic!()
};
// Exceeding `u16::MAX` is undefined behaviour: we choose to saturate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure where the u16::MAX constraint comes from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's totally arbitrary: anything above the bit width of the type is UB.

Copy link
Contributor

@vext01 vext01 Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anything above the bit width of the type is UB.

If I'm understanding correctly, LLVM getelementptr (which is where dyn_ptradd comes from) allows you to compute pointers to anywhere in memory regardless of the type you are operating on.

The docs say:

The result value of the getelementptr may be outside the object pointed to by the base pointer.

Hopefully I haven't gotten the wrong end of the stick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry I misread where you were commenting on!

The docs don't seem to specify a maximum offset, but there obviously must be a maximum. I assumed, based on some other things, that a u16 offset might be enough, but I admit that I didn't follow up on this. Any thoughts as toe the maximum offset?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the maximum offset is determined by what LLVM calls the "pointer index type" for the address space in question.

It's documented in that same section of the docs (the same passage that I was fretting over in MM yesterday):

The indices are first converted to offsets in the pointer’s index type. If the currently indexed type is a struct type, the struct offset corresponding to the index is sign-extended or truncated to the pointer index type. Otherwise, the index itself is sign-extended or truncated, and then multiplied by the type allocation size (that is, the size rounded up to the ABI alignment) of the currently indexed type.

The offsets are then added to the low bits of the base address up to the index type width, with silently-wrapping two’s complement arithmetic. If the pointer size is larger than the index size, this means that the bits outside the index type width will not be affected.

For us, it's important to know:

Under these assumptions, I think the maximum offset is expressed by a signed 64-bit integer, not an unsigned 16-bit one. I think we are also expected to sign-extend the index if it's smaller. But please check my reasoning.

Copy link
Contributor Author

@ltratt ltratt Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so this is messy because we have an i32 limit, so we hit "badness" before LLVM says we should hit UB. 2e73f8c is my attempt to split the difference...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm. Good point.

Let's roll with this for now.

@vext01
Copy link
Contributor

vext01 commented Oct 29, 2024

Looks good. Just a couple of comments.

@vext01
Copy link
Contributor

vext01 commented Oct 29, 2024

Please squash.

@ltratt
Copy link
Contributor Author

ltratt commented Oct 29, 2024

Squashed.

@vext01 vext01 added this pull request to the merge queue Oct 29, 2024
Merged via the queue into ykjit:master with commit 9afc721 Oct 29, 2024
2 checks passed
@ltratt ltratt deleted the opt_opt_opt branch October 30, 2024 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants