Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: BatchIt #637

Closed
wants to merge 17 commits into from
Closed

WIP: BatchIt #637

wants to merge 17 commits into from

Conversation

nwf-msr
Copy link
Contributor

@nwf-msr nwf-msr commented Sep 21, 2023

As part of the assessment of #634, but also perhaps more generally useful. Opinions welcome.

@nwf-msr nwf-msr requested a review from mjp41 September 21, 2023 03:14
@nwf-msr nwf-msr force-pushed the 202309-msgpass branch 4 times, most recently from 8debce9 to a26a036 Compare September 22, 2023 21:58
@mjp41
Copy link
Member

mjp41 commented Sep 24, 2023

@nwf-msr nwf-msr force-pushed the 202309-msgpass branch 2 times, most recently from 81147f0 to 998c2f4 Compare September 25, 2023 20:36
@nwf-msr
Copy link
Contributor Author

nwf-msr commented Sep 25, 2023

Looks like it is leaking in some case https://github.com/microsoft/snmalloc/actions/runs/6279469686/job/17055222812?pr=637#step:7:149

Whoops; I had the loop termination conditions wrong. They're fixed now, I think. Let's see if CI agrees.

@nwf-msr nwf-msr changed the title WIP: msgpass test WIP: msgpass test and related changes Nov 16, 2023
@nwf-msr
Copy link
Contributor Author

nwf-msr commented Nov 16, 2023

After discussions with @mjp41 yesterday, I've introduced a notion of "tweakable obfuscation" and have made all the intra-slab free lists' backwards signatures use the address of the slab metadata as the "tweak". The next step would be to remove the per-thread keys and have everyone use a common global key (probably not RemoteAllocator::key_global!) and apply the same tweaking. This opens the door to sending threads being able to build up segments of slab free lists that can be spliced in by the recipient in O(1) rather than O(n).

@nwf-msr
Copy link
Contributor Author

nwf-msr commented Dec 14, 2023

I've (at long last) got things flying end to end with a very simple "cache" on the sending side -- a single open ring -- but I think some review and investigation is a good idea. Here's what mimalloc-bench makes of the current state of things in terms of time
image
and memory
image

@nwf-msr
Copy link
Contributor Author

nwf-msr commented May 23, 2024

We should:

  1. figure out how to make the caching layer optional, which probably just means some more
    std::conditional_t use.
  2. get someone with chops to assess the "tweaked obfuscation" changes.
  3. offer to randomize deallocator cache construction order (Matt writes: "as we are building a ring, we can add to the start or the end, so perhaps we could at least build an unpredictable order in the ring")
  4. offer probabilistic premature eviction from the deallocator caches to further thwart attempts to control free-list order

@nwf-msr
Copy link
Contributor Author

nwf-msr commented Sep 4, 2024

OK, here's a version with, I think, all todo-s at least prototyped. BatchIt can be turned off (see the "Make BatchIt optional" commit and/or CMakeLists.txt) and it now attempts to play nicely with mitigations(random_preserve) by stochastically evicting batches during construction.

@nwf-msr nwf-msr force-pushed the 202309-msgpass branch 4 times, most recently from 0bc5166 to 1520b2d Compare September 10, 2024 14:35
@nwf-msr nwf-msr force-pushed the 202309-msgpass branch 2 times, most recently from 4ce7feb to d69cbc4 Compare September 12, 2024 14:03
Plumb its use around remoteallocator and remotecache
This prepares the recipient to process a batched message.
Exercise recipient machinery by having the senders collect adjacent frees to
the same slab into a batch.
This might involve multiple (I think at most two, at the moment) transitions in
the slab lifecycle state machine.  Towards that end, return indicators to the
caller that the slow path must be taken and how many objects of the original
set have not yet been counted as returned.
We can, as Matt so kindly reminds me, go get them from the pagemap.  Since we
need this value only when closing a ring, the read from over there is probably
not very onerous.  (We could also get the slab pointer from an object in the
ring, but we need that whenever inserting into the cache, so it's probably more
sensible to store that locally?)
Move ring set bits and associativity knobs to allocconfig and expose them via
CMake.  If associtivity is zero, use non-batched implementations of the
RemoteMessage and RemoteDeallocCacheBatching classes.

By default, kick BatchIt on when we have enough room in the minimum allocation
size to do it.  Exactly how much space is enough is a function of which
mitigations we have enabled and whether or not we are compiling with C++20.
There's no need for a full pointer here, it'd just make the structure larger on
CHERI.
In order not to thwart `mitigations(random_preserve)` too much, if it's on in
combination with BatchIt, roll the dice every time we append to a batch to
decide if we should stochastically evict this batch.  By increasing the number
of batches, we allow the recipient allocator increased opportunity to randomly
stripe batches across the two `freelist::Builder` segments associated with each
slab.
@nwf-msr
Copy link
Contributor Author

nwf-msr commented Sep 12, 2024

Rebased after merging #675. Trees are identical, just commit objects differ.

@nwf nwf mentioned this pull request Sep 18, 2024
@nwf-msr nwf-msr closed this Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants